mindocr
¶
mindocr.data
¶
mindocr.data.base_dataset
¶
mindocr.data.base_dataset.BaseDataset
¶
Bases: object
Base dataset to parse dataset files.
| PARAMETER | DESCRIPTION |
|---|---|
- |
TYPE:
|
- |
TYPE:
|
- |
names of elements in the output tuple of getitem
TYPE:
|
| ATTRIBUTE | DESCRIPTION |
|---|---|
data_list |
source data items (e.g., containing image path and raw annotation)
TYPE:
|
Source code in mindocr\data\base_dataset.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | |
mindocr.data.base_dataset.BaseDataset.get_output_columns()
¶get the column names for the output tuple of getitem, required for data mapping in the next step
Source code in mindocr\data\base_dataset.py
59 60 61 62 63 64 | |
mindocr.data.builder
¶
mindocr.data.builder.build_dataset(dataset_config, loader_config, num_shards=None, shard_id=None, is_train=True, **kwargs)
¶
Build dataset for training and evaluation.
| PARAMETER | DESCRIPTION |
|---|---|
dataset_config |
dataset parsing and processing configuartion containing the following keys
- type (str): dataset class name, please choose from
TYPE:
|
loader_config |
dataloader configuration containing keys: - batch_size (int): batch size for data loader - drop_remainder (boolean): whether to drop the data in the last batch when the total of data can not be divided by the batch_size - num_workers (int): number of subprocesses used to fetch the dataset in parallel.
TYPE:
|
num_shards |
num of devices for distributed mode
TYPE:
|
shard_id |
device id
TYPE:
|
is_train |
whether it is in training stage
TYPE:
|
**kwargs |
optional args for extension. If
DEFAULT:
|
Return
data_loader (Dataset): dataloader to generate data batch
Notes
- The main data process pipeline in MindSpore contains 3 parts: 1) load data files and generate source dataset, 2) perform per-data-row mapping such as image augmentation, 3) generate batch and apply batch mapping.
- Each of the three steps supports multiprocess. Detailed mechanism can be seen in https://www.mindspore.cn/docs/zh-CN/r2.0.0-alpha/api_python/mindspore.dataset.html
- A data row is a data tuple item containing multiple elements such as (image_i, mask_i, label_i). A data column corresponds to an element in the tuple like 'image', 'label'.
- The total number of
num_workersused for data loading and processing should not be larger than the maximum threads of the CPU. Otherwise, it will lead to resource competing overhead. Especially for distributed training,num_parallel_workersshould not be too large to avoid thread competition.
Example
Load a DetDataset/RecDataset¶
from mindocr.data import build_dataset data_config = { "type": "DetDataset", "dataset_root": "path/to/datasets/", "data_dir": "ic15/det/train/ch4_test_images", "label_file": "ic15/det/train/det_gt.txt", "sample_ratio": 1.0, "shuffle": False, "transform_pipeline": [ { "DecodeImage": { "img_mode": "RGB", "to_float32": False } }, { "DetLabelEncode": {}, }, ], "output_columns": ['image', 'polys', 'ignore_tags'], "net_input_column_index`": [0] "label_column_index": [1, 2] } loader_config = dict(shuffle=True, batch_size=16, drop_remainder=False, num_workers=1) data_loader = build_dataset(data_config, loader_config, num_shards=1, shard_id=0, is_train=True)
Source code in mindocr\data\builder.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 | |
mindocr.data.constants
¶
Constant data enhancement parameters of Imagenet dataset
mindocr.data.det_dataset
¶
mindocr.data.det_dataset.DetDataset
¶
Bases: BaseDataset
General dataset for text detection The annotation format should follow:
.. code-block: none
# image file name annotation info containing text and polygon points encoded by json.dumps
img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]
| PARAMETER | DESCRIPTION |
|---|---|
is_train |
whether it is in training stage
TYPE:
|
data_dir |
directory to the image data
TYPE:
|
label_file |
(list of) path to the label file(s), where each line in the label fle contains the image file name and its ocr annotation.
TYPE:
|
sample_ratio |
sample ratios for the data items in label files
TYPE:
|
shuffle(bool) |
Optional, if not given, shuffle = is_train
|
transform_pipeline |
list of dict, key - transform class name, value - a dict of param config. e.g., [{'DecodeImage': {'img_mode': 'BGR', 'channel_first': False}}] if None, default transform pipeline for text detection will be taken.
TYPE:
|
output_columns |
required, indicates the keys in data dict that are expected to output for dataloader. if None, all data keys will be used for return.
TYPE:
|
global_config |
additional info, used in data transformation, possible keys: - character_dict_path
|
| RETURNS | DESCRIPTION |
|---|---|
data
|
Depending on the transform pipeline, get_item returns a tuple for the specified data item.
TYPE:
|
|
You can specify the |
Notes
- The data file structure should be like ├── data_dir │ ├── 000001.jpg │ ├── 000002.jpg │ ├── {image_file_name} ├── label_file.txt
Source code in mindocr\data\det_dataset.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | |
mindocr.data.det_dataset.DetDataset.load_data_list(label_file, sample_ratio, shuffle=False, **kwargs)
¶Load data list from label_file which contains infomation of image paths and annotations
| PARAMETER | DESCRIPTION |
|---|---|
label_file |
annotation file path(s)
TYPE:
|
shuffle |
shuffle the data list
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
data
|
A list of annotation dict, which contains keys: img_path, annot...
TYPE:
|
Source code in mindocr\data\det_dataset.py
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 | |
mindocr.data.predict_dataset
¶
Inference dataset class
mindocr.data.predict_dataset.PredictDataset
¶
Bases: BaseDataset
- The data file structure should be like ├── img_dir │ ├── 000001.jpg │ ├── 000002.jpg │ ├── {image_file_name}
Source code in mindocr\data\predict_dataset.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | |
mindocr.data.rec_dataset
¶
mindocr.data.rec_dataset.RecDataset
¶
Bases: DetDataset
General dataset for text recognition The annotation format should follow:
.. code-block: none
# image file name ground truth text
word_18.png STAGE
word_19.png HarbourFront
| PARAMETER | DESCRIPTION |
|---|---|
is_train |
whether it is in training stage
TYPE:
|
data_dir |
directory to the image data
TYPE:
|
label_file |
(list of) path to the label file(s), where each line in the label fle contains the image file name and its ocr annotation.
TYPE:
|
sample_ratio |
sample ratios for the data items in label files
TYPE:
|
shuffle(bool) |
Optional, if not given, shuffle = is_train
|
transform_pipeline |
list of dict, key - transform class name, value - a dict of param config. e.g., [{'DecodeImage': {'img_mode': 'BGR', 'channel_first': False}}] if None, default transform pipeline for text detection will be taken.
|
output_columns |
required, indicates the keys in data dict that are expected to output for dataloader. if None, all data keys will be used for return.
TYPE:
|
global_config |
additional info, used in data transformation, possible keys: - character_dict_path
|
| RETURNS | DESCRIPTION |
|---|---|
data
|
Depending on the transform pipeline, get_item returns a tuple for the specified data item.
TYPE:
|
|
You can specify the |
Notes
- The data file structure should be like ├── data_dir │ ├── 000001.jpg │ ├── 000002.jpg │ ├── {image_file_name} ├── label_file.txt
Source code in mindocr\data\rec_dataset.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | |
mindocr.data.rec_lmdb_dataset
¶
mindocr.data.rec_lmdb_dataset.LMDBDataset
¶
Bases: BaseDataset
Data iterator for ocr datasets including ICDAR15 dataset.
The annotaiton format is required to aligned to paddle, which can be done using the converter.py script.
| PARAMETER | DESCRIPTION |
|---|---|
is_train |
whether the dataset is for training
TYPE:
|
data_dir |
data root directory for lmdb dataset(s)
TYPE:
|
shuffle |
Optional, if not given, shuffle = is_train
TYPE:
|
transform_pipeline |
list of dict, key - transform class name, value - a dict of param config. e.g., [{'DecodeImage': {'img_mode': 'BGR', 'channel_first': False}}] - if None, default transform pipeline for text detection will be taken.
TYPE:
|
output_columns |
optional, indicates the keys in data dict that are expected to output for dataloader. if None, all data keys will be used for return.
TYPE:
|
filter_max_len |
Filter the records where the label is longer than the
TYPE:
|
max_text_len |
The maximum text length the dataloader expected.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
data
|
Depending on the transform pipeline, get_item returns a tuple for the specified data item.
TYPE:
|
|
You can specify the |
Notes
- Dataset file structure should follow: data_dir ├── dataset01 ├── data.mdb ├── lock.mdb ├── dataset02 ├── data.mdb ├── lock.mdb ├── ...
Source code in mindocr\data\rec_lmdb_dataset.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 | |
mindocr.data.transforms
¶
transforms init
mindocr.data.transforms.det_east_transforms
¶
mindocr.data.transforms.det_east_transforms.EASTProcessTrain
¶Source code in mindocr\data\transforms\det_east_transforms.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 | |
mindocr.data.transforms.det_transforms
¶
transforms for text detection tasks.
mindocr.data.transforms.det_transforms.BorderMap
¶Source code in mindocr\data\transforms\det_transforms.py
217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 | |
mindocr.data.transforms.det_transforms.DetLabelEncode
¶Source code in mindocr\data\transforms\det_transforms.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | |
mindocr.data.transforms.det_transforms.DetLabelEncode.__call__(data)
¶required keys
label (str): string containgin points and transcription in json format
added keys
polys (np.ndarray): polygon boxes in an image, each polygon is represented by points in shape [num_polygons, num_points, 2] texts (List(str)): text string ignore_tags (np.ndarray[bool]): indicators for ignorable texts (e.g., '###')
Source code in mindocr\data\transforms\det_transforms.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | |
mindocr.data.transforms.det_transforms.DetResize
¶Resize the image and text polygons (if have) for text detection
| PARAMETER | DESCRIPTION |
|---|---|
target_size |
target size [H, W] of the output image. If it is not None,
TYPE:
|
keep_ratio |
whether to keep aspect ratio. Default: True
DEFAULT:
|
padding |
whether to pad the image to the
DEFAULT:
|
limit_type |
it decides the resize method type. Option: 'min', 'max', None. Default: "min"
- 'min': images will be resized by limiting the mininum side length to
DEFAULT:
|
limit_side_len |
side len limitation.
DEFAULT:
|
force_divisable |
whether to force the image being resize to a size multiple of
DEFAULT:
|
divisor |
divisor used when
DEFAULT:
|
interpoloation |
interpolation method
|
Note
- The default choices limit_type=min, with large
limit_side_lenare recommended for inference in detection for better accuracy, - If target_size set, keep_ratio=True, limit_type=null, padding=True, this transform works the same as ScalePadImage,
- If inference speed is the first priority to guarante, you can set limit_type=max with a small
limit_side_lenlike 960.
Source code in mindocr\data\transforms\det_transforms.py
322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 | |
mindocr.data.transforms.det_transforms.DetResize.__call__(data)
¶required keys
modified keys
image (polys)
added keys
Source code in mindocr\data\transforms\det_transforms.py
408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 | |
mindocr.data.transforms.det_transforms.GridResize
¶
Bases: DetResize
Resize image to make it divisible by a specified factor exactly. Resize polygons correspondingly, if provided.
Source code in mindocr\data\transforms\det_transforms.py
500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 | |
mindocr.data.transforms.det_transforms.RandomCropWithBBox
¶Randomly cuts a crop from an image along with polygons in the way that the crop doesn't intersect any polygons (i.e. any given polygon is either fully inside or fully outside the crop).
| PARAMETER | DESCRIPTION |
|---|---|
max_tries |
number of attempts to try to cut a crop with a polygon in it. If fails, scales the whole image to
match the
DEFAULT:
|
min_crop_ratio |
minimum size of a crop in respect to an input image size.
DEFAULT:
|
crop_size |
target size of the crop (resized and padded, if needed), preserves sides ratio.
DEFAULT:
|
p |
probability of the augmentation being applied to an image.
TYPE:
|
Source code in mindocr\data\transforms\det_transforms.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | |
mindocr.data.transforms.det_transforms.ScalePadImage
¶
Bases: DetResize
Scale image and polys by the shorter side, then pad to the target_size. input image format: hwc
| PARAMETER | DESCRIPTION |
|---|---|
target_size |
[H, W] of the output image.
TYPE:
|
Source code in mindocr\data\transforms\det_transforms.py
517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 | |
mindocr.data.transforms.det_transforms.ShrinkBinaryMap
¶Making binary mask from detection data with ICDAR format.
Typically following the process of class MakeICDARData.
Source code in mindocr\data\transforms\det_transforms.py
286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 | |
mindocr.data.transforms.det_transforms.ValidatePolygons
¶Validate polygons by
- filtering out polygons outside an image.
- clipping coordinates of polygons that are partially outside an image to stay within the visible region.
| PARAMETER | DESCRIPTION |
|---|---|
min_area |
minimum area below which newly clipped polygons considered as ignored.
TYPE:
|
clip_to_visible_area |
(Experimental) clip polygons to a visible area. Number of vertices in a polygon after clipping may change.
TYPE:
|
min_vertices |
minimum number of vertices in a polygon below which newly clipped polygons considered as ignored.
TYPE:
|
Source code in mindocr\data\transforms\det_transforms.py
616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 | |
mindocr.data.transforms.general_transforms
¶
mindocr.data.transforms.general_transforms.DecodeImage
¶img_mode (str): The channel order of the output, 'BGR' and 'RGB'. Default to 'BGR'. channel_first (bool): if True, image shpae is CHW. If False, HWC. Default to False
Source code in mindocr\data\transforms\general_transforms.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | |
mindocr.data.transforms.general_transforms.NormalizeImage
¶normalize image, subtract mean, divide std input image: by default, np.uint8, [0, 255], HWC format. return image: float32 numpy array
Source code in mindocr\data\transforms\general_transforms.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 | |
mindocr.data.transforms.general_transforms.PackLoaderInputs
¶| PARAMETER | DESCRIPTION |
|---|---|
output_columns |
the keys in data dict that are expected to output for dataloader
TYPE:
|
Call
Source code in mindocr\data\transforms\general_transforms.py
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | |
mindocr.data.transforms.general_transforms.RandomColorAdjust
¶Source code in mindocr\data\transforms\general_transforms.py
186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 | |
mindocr.data.transforms.general_transforms.RandomColorAdjust.__call__(data)
¶required keys: image modified keys: image
Source code in mindocr\data\transforms\general_transforms.py
193 194 195 196 197 198 199 200 | |
mindocr.data.transforms.general_transforms.RandomHorizontalFlip
¶Random horizontal flip of an image with polygons in it (if any).
| PARAMETER | DESCRIPTION |
|---|---|
p |
probability of the augmentation being applied to an image.
TYPE:
|
Source code in mindocr\data\transforms\general_transforms.py
243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 | |
mindocr.data.transforms.general_transforms.RandomRotate
¶Randomly rotate an image with polygons in it (if any).
| PARAMETER | DESCRIPTION |
|---|---|
degrees |
range of angles [min, max]
DEFAULT:
|
expand_canvas |
whether to expand canvas during rotation (the image size will be increased) or maintain the original size (the rotated image will be cropped back to the original size).
DEFAULT:
|
p |
probability of the augmentation being applied to an image.
TYPE:
|
Source code in mindocr\data\transforms\general_transforms.py
203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 | |
mindocr.data.transforms.general_transforms.RandomScale
¶Randomly scales an image and its polygons in a predefined scale range.
| PARAMETER | DESCRIPTION |
|---|---|
scale_range |
(min, max) scale range.
TYPE:
|
p |
probability of the augmentation being applied to an image.
TYPE:
|
Source code in mindocr\data\transforms\general_transforms.py
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 | |
mindocr.data.transforms.general_transforms.RandomScale.__call__(data)
¶required keys
image, HWC (polys)
modified keys
image (polys)
Source code in mindocr\data\transforms\general_transforms.py
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 | |
mindocr.data.transforms.rec_transforms
¶
transform for text recognition tasks.
mindocr.data.transforms.rec_transforms.RecAttnLabelEncode
¶Source code in mindocr\data\transforms\rec_transforms.py
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 | |
mindocr.data.transforms.rec_transforms.RecAttnLabelEncode.__init__(max_text_len=25, character_dict_path=None, use_space_char=False, lower=False, **kwargs)
¶Convert text label (str) to a sequence of character indices according to the char dictionary
| PARAMETER | DESCRIPTION |
|---|---|
max_text_len |
to pad the label text to a fixed length (max_text_len) of text for attn loss computate.
TYPE:
|
character_dict_path |
path to dictionary, if None, a dictionary containing 36 chars (i.e., "0123456789abcdefghijklmnopqrstuvwxyz") will be used.
TYPE:
|
use_space_char(bool) |
if True, add space char to the dict to recognize the space in between two words
|
lower |
if True, all upper-case chars in the label text will be converted to lower case. Set to be True if dictionary only contains lower-case chars. Set to be False if not and want to recognition both upper-case and lower-case.
TYPE:
|
| ATTRIBUTE | DESCRIPTION |
|---|---|
go_idx |
the index of the GO token
|
stop_idx |
the index of the STOP token
|
num_valid_chars |
the number of valid characters (including space char if used) in the dictionary
|
num_classes |
the number of classes (which valid characters char and the speical token for blank padding).
|
Source code in mindocr\data\transforms\rec_transforms.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 | |
mindocr.data.transforms.rec_transforms.RecCTCLabelEncode
¶
Bases: object
Convert text label (str) to a sequence of character indices according to the char dictionary
| PARAMETER | DESCRIPTION |
|---|---|
max_text_len |
to pad the label text to a fixed length (max_text_len) of text for ctc loss computate.
DEFAULT:
|
character_dict_path |
path to dictionary, if None, a dictionary containing 36 chars (i.e., "0123456789abcdefghijklmnopqrstuvwxyz") will be used.
DEFAULT:
|
use_space_char(bool) |
if True, add space char to the dict to recognize the space in between two words
|
blank_at_last(bool) |
padding with blank index (not the space index). If True, a blank/padding token will be appended to the end of the dictionary, so that blank_index = num_chars, where num_chars is the number of character in the dictionary including space char if used. If False, blank token will be inserted in the beginning of the dictionary, so blank_index=0.
|
lower |
if True, all upper-case chars in the label text will be converted to lower case. Set to be True if dictionary only contains lower-case chars. Set to be False if not and want to recognition both upper-case and lower-case.
TYPE:
|
| ATTRIBUTE | DESCRIPTION |
|---|---|
blank_idx |
the index of the blank token for padding
|
num_valid_chars |
the number of valid characters (including space char if used) in the dictionary
|
num_classes |
the number of classes (which valid characters char and the speical token for blank padding). so num_classes = num_valid_chars + 1
|
Source code in mindocr\data\transforms\rec_transforms.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | |
mindocr.data.transforms.rec_transforms.RecCTCLabelEncode.__call__(data)
¶required keys
label -> (str) text string
added keys
text_seq-> (np.ndarray, int32), sequence of character indices after padding to max_text_len in shape (sequence_len), where ood characters are skipped
added keys
length -> (np.int32) the number of valid chars in the encoded char index sequence, where valid means the char is in dictionary. text_padded -> (str) text label padded to fixed length, to solved the dynamic shape issue in dataloader. text_length -> int, the length of original text string label
Source code in mindocr\data\transforms\rec_transforms.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | |
mindocr.data.transforms.rec_transforms.RecResizeImg
¶
Bases: object
adopted from paddle resize, convert from hwc to chw, rescale pixel value to -1 to 1
Source code in mindocr\data\transforms\rec_transforms.py
326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 | |
mindocr.data.transforms.rec_transforms.RecResizeNormForInfer
¶
Bases: object
Resize image for text recognition
| PARAMETER | DESCRIPTION |
|---|---|
target_height |
target height after resize. Commonly, 32 for crnn, 48 for svtr. default is 32.
DEFAULT:
|
target_width |
target width. Default is 320. If None, image width is scaled to make aspect ratio unchanged.
DEFAULT:
|
keep_ratio |
keep aspect ratio.
If True, resize the image with ratio=target_height / input_height (certain image height is required by
recognition network).
If False, simply resize to targte size (
DEFAULT:
|
padding |
If True, pad the resized image to the targte size with zero RGB values.
only used when
DEFAULT:
|
Notes
- The default choice (keep_ratio, not padding) is suitable for inference for better accuracy.
Source code in mindocr\data\transforms\rec_transforms.py
363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 | |
mindocr.data.transforms.rec_transforms.RecResizeNormForInfer.__call__(data)
¶Source code in mindocr\data\transforms\rec_transforms.py
406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 | |
mindocr.data.transforms.rec_transforms.Rotate90IfVertical
¶Rotate the image by 90 degree when the height/width ratio is larger than the given threshold. Note: It needs to be called before image resize.
Source code in mindocr\data\transforms\rec_transforms.py
447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 | |
mindocr.data.transforms.rec_transforms.resize_norm_img(img, image_shape, padding=True, interpolation=cv2.INTER_LINEAR)
¶resize image
| PARAMETER | DESCRIPTION |
|---|---|
img |
shape (H, W, C)
|
image_shape |
image shape after resize, in (C, H, W)
|
padding |
if Ture, resize while preserving the H/W ratio, then pad the blank.
DEFAULT:
|
Source code in mindocr\data\transforms\rec_transforms.py
250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 | |
mindocr.data.transforms.rec_transforms.resize_norm_img_chinese(img, image_shape)
¶adopted from paddle
Source code in mindocr\data\transforms\rec_transforms.py
291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 | |
mindocr.data.transforms.rec_transforms.str2idx(text, label_dict, max_text_len=23, lower=False)
¶Encode text (string) to a squence of char indices
| PARAMETER | DESCRIPTION |
|---|---|
text |
text string
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
char_indices
|
char index seq
TYPE:
|
Source code in mindocr\data\transforms\rec_transforms.py
222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 | |
mindocr.data.transforms.svtr_transform
¶
mindocr.data.transforms.svtr_transform.CVRescale
¶
Bases: object
Source code in mindocr\data\transforms\svtr_transform.py
237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 | |
mindocr.data.transforms.svtr_transform.CVRescale.__init__(factor=4, base_size=(128, 512))
¶Define image scales using gaussian pyramid and rescale image to target scale.
| PARAMETER | DESCRIPTION |
|---|---|
factor |
the decayed factor from base size, factor=4 keeps target scale by default.
DEFAULT:
|
base_size |
base size the build the bottom layer of pyramid
DEFAULT:
|
Source code in mindocr\data\transforms\svtr_transform.py
238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 | |
mindocr.data.transforms.transforms_factory
¶
Create and run transformations from a config or predefined transformation pipeline
mindocr.data.transforms.transforms_factory.create_transforms(transform_pipeline, global_config=None)
¶Create a squence of callable transforms.
| PARAMETER | DESCRIPTION |
|---|---|
transform_pipeline |
list of callable instances or dicts where each key is a transformation class name, and its value are the args. e.g. [{'DecodeImage': {'img_mode': 'BGR', 'channel_first': False}}] [DecodeImage(img_mode='BGR')]
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
list of data transformation functions |
Source code in mindocr\data\transforms\transforms_factory.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | |
mindocr.data.transforms.transforms_factory.transforms_dbnet_icdar15(phase='train')
¶Get pre-defined transform config for dbnet on icdar15 dataset.
| PARAMETER | DESCRIPTION |
|---|---|
phase |
train, eval, infer
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
|
list of dict for data transformation pipeline, which can be convert to functions by 'create_transforms' |
Source code in mindocr\data\transforms\transforms_factory.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 | |
mindocr.losses
¶
mindocr.losses.builder
¶
mindocr.losses.builder.build_loss(name, **kwargs)
¶
Create the loss function.
| PARAMETER | DESCRIPTION |
|---|---|
name |
loss function name, exactly the same as one of the supported loss class names
TYPE:
|
Return
nn.LossBase
Example
Create a CTC Loss module¶
from mindocr.losses import build_loss loss_func_name = "CTCLoss" loss_func_config = {"pred_seq_len": 25, "max_label_len": 24, "batch_size": 32} loss_fn = build_loss(loss_func_name, **loss_func_config) loss_fn CTCLoss<>
Source code in mindocr\losses\builder.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | |
mindocr.losses.cls_loss
¶
mindocr.losses.cls_loss.CrossEntropySmooth
¶
Bases: nn.LossBase
Cross entropy loss with label smoothing.
Apply softmax activation function to input logits, and uses the given logits to compute cross entropy
between the logits and the label.
| PARAMETER | DESCRIPTION |
|---|---|
smoothing |
Label smoothing factor, a regularization tool used to prevent the model from overfitting when calculating Loss. The value range is [0.0, 1.0]. Default: 0.0.
DEFAULT:
|
aux_factor |
Auxiliary loss factor. Set aux_factor > 0.0 if the model has auxiliary logit outputs (i.e., deep supervision), like inception_v3. Default: 0.0.
DEFAULT:
|
reduction |
Apply specific reduction method to the output: 'mean' or 'sum'. Default: 'mean'.
DEFAULT:
|
weight |
Class weight. Shape [C]. A rescaling weight applied to the loss of each batch element. Data type must be float16 or float32.
TYPE:
|
Inputs
logits (Tensor or Tuple of Tensor): Input logits. Shape [N, C], where N is # samples, C is # classes. Tuple composed of multiple logits are supported in order (main_logits, aux_logits) for auxiliary loss used in networks like inception_v3. labels (Tensor): Ground truth label. Shape: [N] or [N, C]. (1) Shape (N), sparse labels representing the class indices. Must be int type. (2) Shape [N, C], dense labels representing the ground truth class probability values, or the one-hot labels. Must be float type.
Source code in mindocr\losses\cls_loss.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | |
mindocr.losses.det_loss
¶
mindocr.losses.det_loss.BalancedBCELoss
¶
Bases: nn.LossBase
Balanced cross entropy loss.
Source code in mindocr\losses\det_loss.py
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 | |
mindocr.losses.det_loss.BalancedBCELoss.construct(pred, gt, mask)
¶| PARAMETER | DESCRIPTION |
|---|---|
pred |
shape :math:
|
gt |
shape :math:
|
mask |
shape :math:
|
Source code in mindocr\losses\det_loss.py
124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 | |
mindocr.losses.det_loss.DiceLoss
¶
Bases: nn.LossBase
Source code in mindocr\losses\det_loss.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | |
mindocr.losses.det_loss.DiceLoss.construct(pred, gt, mask)
¶one or two heatmaps of shape (N, 1, H, W),
the losses of two heatmaps are added together.
Source code in mindocr\losses\det_loss.py
84 85 86 87 88 89 90 91 92 93 94 95 96 | |
mindocr.losses.det_loss.L1BalancedCELoss
¶
Bases: nn.LossBase
Balanced CrossEntropy Loss on binary,
MaskL1Loss on thresh,
DiceLoss on thresh_binary.
Note: The meaning of inputs can be figured out in SegDetectorLossBuilder.
Source code in mindocr\losses\det_loss.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | |
mindocr.losses.det_loss.L1BalancedCELoss.construct(pred, gt, gt_mask, thresh_map, thresh_mask)
¶Compute dbnet loss
| PARAMETER | DESCRIPTION |
|---|---|
pred |
network prediction consists of
binary: The text segmentation prediction.
thresh: The threshold prediction (optional)
thresh_binary: Value produced by
TYPE:
|
gt |
Text regions bitmap gt.
TYPE:
|
mask |
Ignore mask, pexels where value is 1 indicates no contribution to loss.
TYPE:
|
thresh_mask |
Mask indicates regions cared by thresh supervision.
TYPE:
|
thresh_map |
Threshold gt.
TYPE:
|
Return
loss value (Tensor)
Source code in mindocr\losses\det_loss.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | |
mindocr.losses.det_loss.MaskL1Loss
¶
Bases: nn.LossBase
Source code in mindocr\losses\det_loss.py
99 100 101 102 103 104 105 106 107 108 109 110 111 112 | |
mindocr.losses.det_loss.MaskL1Loss.construct(pred, gt, mask)
¶| PARAMETER | DESCRIPTION |
|---|---|
pred |
shape :math:
|
gt |
shape :math:
|
mask |
shape :math:
|
Source code in mindocr\losses\det_loss.py
104 105 106 107 108 109 110 111 112 | |
mindocr.losses.det_loss.PSEDiceLoss
¶
Bases: nn.Cell
Source code in mindocr\losses\det_loss.py
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 | |
mindocr.losses.det_loss.PSEDiceLoss.construct(model_predict, gt_texts, gt_kernels, training_masks)
¶:param model_predict: [N * 7 * H * W] :param gt_texts: [N * H * W] :param gt_kernels:[N * 6 * H * W] :param training_masks:[N * H * W] :return:
Source code in mindocr\losses\det_loss.py
282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 | |
mindocr.losses.det_loss.PSEDiceLoss.dice_loss(input_params, target, mask)
¶:param input: [N, H, W] :param target: [N, H, W] :param mask: [N, H, W] :return:
Source code in mindocr\losses\det_loss.py
250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 | |
mindocr.losses.det_loss.PSEDiceLoss.ohem_batch(scores, gt_texts, training_masks)
¶:param scores: [N * H * W] :param gt_texts: [N * H * W] :param training_masks: [N * H * W] :return: [N * H * W]
Source code in mindocr\losses\det_loss.py
193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 | |
mindocr.losses.rec_loss
¶
mindocr.losses.rec_loss.CTCLoss
¶
Bases: LossBase
CTCLoss definition
| PARAMETER | DESCRIPTION |
|---|---|
pred_seq_len(int) |
the length of the predicted character sequence. For text images, this value equals to W - the width of feature map encoded by the visual bacbkone. This can be obtained by probing the output shape in the network. E.g., for a training image in shape (3, 32, 100), the feature map encoded by resnet34 bacbkone is in shape (512, 1, 4), W = 4, sequence len is 4.
|
max_label_len(int) |
the maximum number of characters in a text label, i.e. max_text_len in yaml.
|
batch_size(int) |
batch size of input logits. bs
|
Source code in mindocr\losses\rec_loss.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | |
mindocr.losses.rec_loss.CTCLoss.construct(pred, label)
¶| PARAMETER | DESCRIPTION |
|---|---|
pred |
network prediction which is a logit Tensor in shape (W, BS, NC), where W - seq len, BS - batch size. NC - num of classes (types of character + blank + 1)
TYPE:
|
label |
GT sequence of character indices in shape (BS, SL), SL - sequence length, which is padded to max_text_length
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
loss value (Tensor) |
Source code in mindocr\losses\rec_loss.py
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | |
mindocr.metrics
¶
mindocr.metrics.build_metric(config, device_num=1, **kwargs)
¶
Create the metric function.
| PARAMETER | DESCRIPTION |
|---|---|
config |
configuration for metric including metric
TYPE:
|
device_num |
number of devices. If device_num > 1, metric will be computed in distributed mode,
i.e., aggregate intermediate variables (e.g., num_correct, TP) from all devices
by
TYPE:
|
Return
nn.Metric
Example
Create a RecMetric module for text recognition¶
from mindocr.metrics import build_metric metric_config = {"name": "RecMetric", "main_indicator": "acc", "character_dict_path": None, "ignore_space": True, "print_flag": False} metric = build_metric(metric_config) metric
Source code in mindocr\metrics\builder.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | |
mindocr.metrics.builder
¶
mindocr.metrics.builder.build_metric(config, device_num=1, **kwargs)
¶
Create the metric function.
| PARAMETER | DESCRIPTION |
|---|---|
config |
configuration for metric including metric
TYPE:
|
device_num |
number of devices. If device_num > 1, metric will be computed in distributed mode,
i.e., aggregate intermediate variables (e.g., num_correct, TP) from all devices
by
TYPE:
|
Return
nn.Metric
Example
Create a RecMetric module for text recognition¶
from mindocr.metrics import build_metric metric_config = {"name": "RecMetric", "main_indicator": "acc", "character_dict_path": None, "ignore_space": True, "print_flag": False} metric = build_metric(metric_config) metric
Source code in mindocr\metrics\builder.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | |
mindocr.metrics.cls_metrics
¶
mindocr.metrics.cls_metrics.ClsMetric
¶
Bases: object
Compute the text direction classification accuracy.
Source code in mindocr\metrics\cls_metrics.py
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | |
mindocr.metrics.cls_metrics.ClsMetric.__init__(label_list=None, **kwargs)
¶Source code in mindocr\metrics\cls_metrics.py
7 8 9 10 11 12 13 14 15 16 17 | |
mindocr.metrics.det_metrics
¶
mindocr.metrics.det_metrics.DetMetric
¶
Bases: nn.Metric
Source code in mindocr\metrics\det_metrics.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | |
mindocr.metrics.det_metrics.DetMetric.eval()
¶Evaluate by aggregating results from batch update
| DICT, AVERAGE PRECISION, RECALL, F1-SCORE OF ALL SAMPLES | DESCRIPTION |
|---|---|
precision
|
precision, |
recall
|
recall, |
|
|
Source code in mindocr\metrics\det_metrics.py
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | |
mindocr.metrics.det_metrics.DetMetric.update(*inputs)
¶compute metric on a batch of data
| PARAMETER | DESCRIPTION |
|---|---|
inputs |
contain two elements preds, gt preds (dict): text detection prediction as a dictionary with keys: polys: np.ndarray of shape (N, K, 4, 2) score: np.ndarray of shape (N, K), confidence score gts (tuple): ground truth - (polygons, ignore_tags), where polygons are in shape [num_images, num_boxes, 4, 2], ignore_tags are in shape [num_images, num_boxes], which can be defined by output_columns in yaml
TYPE:
|
Source code in mindocr\metrics\det_metrics.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | |
mindocr.metrics.rec_metrics
¶
Metric for accuracy evaluation.
mindocr.metrics.rec_metrics.RecMetric
¶
Bases: nn.Metric
Define accuracy metric for warpctc network.
| PARAMETER | DESCRIPTION |
|---|---|
ignore_space |
remove space in prediction and ground truth text if True
DEFAULT:
|
filter_ood |
filter out-of-dictionary characters(e.g., '$' for the default digit+en dictionary) in ground truth text. Default is True.
DEFAULT:
|
lower |
convert GT text to lower case. Recommend to set True if the dictionary does not contains upper letters
DEFAULT:
|
Notes
Since the OOD characters are skipped during label encoding in data transformation by default, filter_ood should be True. (Paddle skipped the OOD character in label encoding and then decoded the label indices back to text string, which has no ood character.
Source code in mindocr\metrics\rec_metrics.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | |
mindocr.metrics.rec_metrics.RecMetric.update(*inputs)
¶Updates the internal evaluation result
| PARAMETER | DESCRIPTION |
|---|---|
inputs |
contain two elements preds, gt preds (dict): prediction output by postprocess, keys: - texts, List[str], batch of predicted text strings, shape [BS, ] - confs (optional), List[float], batch of confidence values for the prediction gt (tuple or list): ground truth, order defined by output_columns in eval dataloader. require element: gt_texts, for the grouth truth texts (padded to the fixed length), shape [BS, ] gt_lens (optional), length of original text if padded, shape [BS, ]
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the number of the inputs is not 2. |
Source code in mindocr\metrics\rec_metrics.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 | |
mindocr.models
¶
mindocr.models.backbones
¶
mindocr.models.backbones.builder
¶
mindocr.models.backbones.builder.build_backbone(name, **kwargs)
¶Build the backbone network.
| PARAMETER | DESCRIPTION |
|---|---|
name |
the backbone name, which can be a registered backbone class name or a registered backbone (function) name.
TYPE:
|
kwargs |
input args for the backbone
1) if
TYPE:
|
Return
nn.Cell for backbone module
Construct
Example
build using backbone function name¶
from mindocr.models.backbones import build_backbone backbone = build_backbone('det_resnet50', pretrained=True)
build using backbone class name¶
from mindocr.models.backbones.mindcv_models.resnet import Bottleneck cfg_from_class = dict(name='DetResNet', Bottleneck, layers=[3,4,6,3]) backbone = build_backbone(**cfg_from_class) print(backbone)
Source code in mindocr\models\backbones\builder.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | |
mindocr.models.backbones.cls_mobilenet_v3
¶
mindocr.models.backbones.cls_mobilenet_v3.cls_mobilenet_v3_small_100(pretrained=True, in_channels=3, **kwargs)
¶Get small MobileNetV3 model without width scaling.
Source code in mindocr\models\backbones\cls_mobilenet_v3.py
26 27 28 29 30 31 32 33 34 35 36 37 | |
mindocr.models.backbones.det_mobilenet
¶
mindocr.models.backbones.det_resnet
¶
mindocr.models.backbones.mindcv_models
¶
models init
mindocr.models.backbones.mindcv_models.bit
¶MindSpore implementation of BiT_ResNet.
Refer to Big Transfer (BiT): General Visual Representation Learning.
mindocr.models.backbones.mindcv_models.bit.BiT_ResNet
¶
Bases: nn.Cell
BiT_ResNet model class, based on
"Big Transfer (BiT): General Visual Representation Learning" <https://arxiv.org/abs/1912.11370>_
| PARAMETER | DESCRIPTION |
|---|---|
block(Union[Bottleneck]) |
block of BiT_ResNetv2.
|
layers(tuple(int)) |
number of layers of each stage.
|
wf(int) |
width of each layer. Default: 1.
|
num_classes(int) |
number of classification classes. Default: 1000.
|
in_channels(int) |
number the channels of the input. Default: 3.
|
groups(int) |
number of groups for group conv in blocks. Default: 1.
|
base_width(int) |
base width of pre group hidden channel in blocks. Default: 64.
|
norm(nn.Cell) |
normalization layer in blocks. Default: None.
|
Source code in mindocr\models\backbones\mindcv_models\bit.py
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 | |
mindocr.models.backbones.mindcv_models.bit.BiT_ResNet.forward_features(x)
¶Network forward feature extraction.
Source code in mindocr\models\backbones\mindcv_models\bit.py
247 248 249 250 251 252 253 | |
mindocr.models.backbones.mindcv_models.bit.Bottleneck
¶
Bases: nn.Cell
define the basic block of BiT
| PARAMETER | DESCRIPTION |
|---|---|
in_channels(int) |
The channel number of the input tensor of the Conv2d layer.
|
channels(int) |
The channel number of the output tensor of the middle Conv2d layer.
|
stride(int) |
The movement stride of the 2D convolution kernel. Default: 1.
|
groups(int) |
Number of groups for group conv in blocks. Default: 1.
|
base_width(int) |
Base width of pre group hidden channel in blocks. Default: 64.
|
norm(nn.Cell) |
Normalization layer in blocks. Default: None.
|
down_sample(nn.Cell) |
Down sample in blocks. Default: None.
|
Source code in mindocr\models\backbones\mindcv_models\bit.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 | |
mindocr.models.backbones.mindcv_models.bit.StdConv2d
¶
Bases: nn.Conv2d
Conv2d with Weight Standardization
| PARAMETER | DESCRIPTION |
|---|---|
in_channels(int) |
The channel number of the input tensor of the Conv2d layer.
|
out_channels(int) |
The channel number of the output tensor of the Conv2d layer.
|
kernel_size(int) |
Specifies the height and width of the 2D convolution kernel.
|
stride(int) |
The movement stride of the 2D convolution kernel. Default: 1.
|
pad_mode(str) |
Specifies padding mode. The optional values are "same", "valid", "pad". Default: "same".
|
padding(int) |
The number of padding on the height and width directions of the input. Default: 0.
|
group(int) |
Splits filter into groups. Default: 1.
|
Source code in mindocr\models\backbones\mindcv_models\bit.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | |
mindocr.models.backbones.mindcv_models.bit.BiTresnet101(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 101 layers ResNet model.
Refer to the base class models.BiT_Resnet for more details.
Source code in mindocr\models\backbones\mindcv_models\bit.py
298 299 300 301 302 303 304 305 306 307 308 309 | |
mindocr.models.backbones.mindcv_models.bit.BiTresnet50(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 50 layers ResNet model.
Refer to the base class models.BiT_Resnet for more details.
Source code in mindocr\models\backbones\mindcv_models\bit.py
270 271 272 273 274 275 276 277 278 279 280 281 | |
mindocr.models.backbones.mindcv_models.bit.BiTresnet50x3(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 50 layers ResNet model.
Refer to the base class models.BiT_Resnet for more details.
Source code in mindocr\models\backbones\mindcv_models\bit.py
284 285 286 287 288 289 290 291 292 293 294 295 | |
mindocr.models.backbones.mindcv_models.cait
¶MindSpore implementation of CaiT.
Refer to Going deeper with Image Transformers.
mindocr.models.backbones.mindcv_models.cait.AttentionTalkingHead
¶
Bases: nn.Cell
Talking head is a trick for multi-head attention, which has two more linear map before and after the softmax compared to normal attention.
Source code in mindocr\models\backbones\mindcv_models\cait.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 | |
mindocr.models.backbones.mindcv_models.coat
¶CoaT architecture. Modified from timm/models/vision_transformer.py
mindocr.models.backbones.mindcv_models.coat.CoaT
¶
Bases: nn.Cell
CoaT class.
Source code in mindocr\models\backbones\mindcv_models\coat.py
444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 | |
mindocr.models.backbones.mindcv_models.coat.ConvPosEnc
¶
Bases: nn.Cell
Convolutional Position Encoding. Note: This module is similar to the conditional position encoding in CPVT.
Source code in mindocr\models\backbones\mindcv_models\coat.py
203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 | |
mindocr.models.backbones.mindcv_models.coat.FactorAtt_ConvRelPosEnc
¶
Bases: nn.Cell
Factorized attention with convolutional relative position encoding class.
Source code in mindocr\models\backbones\mindcv_models\coat.py
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 | |
mindocr.models.backbones.mindcv_models.coat.Mlp
¶
Bases: nn.Cell
MLP Cell
Source code in mindocr\models\backbones\mindcv_models\coat.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | |
mindocr.models.backbones.mindcv_models.coat.ParallelBlock
¶
Bases: nn.Cell
Parallel block class.
Source code in mindocr\models\backbones\mindcv_models\coat.py
282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 | |
mindocr.models.backbones.mindcv_models.coat.ParallelBlock.downsample(x, output_size, size)
¶Feature map down-sampling.
Source code in mindocr\models\backbones\mindcv_models\coat.py
338 339 340 | |
mindocr.models.backbones.mindcv_models.coat.ParallelBlock.interpolate(x, output_size, size)
¶Feature map interpolation.
Source code in mindocr\models\backbones\mindcv_models\coat.py
342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 | |
mindocr.models.backbones.mindcv_models.coat.ParallelBlock.upsample(x, output_size, size)
¶Feature map up-sampling.
Source code in mindocr\models\backbones\mindcv_models\coat.py
334 335 336 | |
mindocr.models.backbones.mindcv_models.coat.PatchEmbed
¶
Bases: nn.Cell
Image to Patch Embedding
Source code in mindocr\models\backbones\mindcv_models\coat.py
402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 | |
mindocr.models.backbones.mindcv_models.coat.SerialBlock
¶
Bases: nn.Cell
Serial block class. Note: In this implementation, each serial block only contains a conv-attention and a FFN (MLP) module.
Source code in mindocr\models\backbones\mindcv_models\coat.py
240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 | |
mindocr.models.backbones.mindcv_models.convit
¶MindSpore implementation of ConViT.
Refer to ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
mindocr.models.backbones.mindcv_models.convit.Block
¶
Bases: nn.Cell
Basic module of ConViT
Source code in mindocr\models\backbones\mindcv_models\convit.py
174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 | |
mindocr.models.backbones.mindcv_models.convit.ConViT
¶
Bases: nn.Cell
ConViT model class, based on '"Improving Vision Transformers with Soft Convolutional Inductive Biases" https://arxiv.org/pdf/2103.10697.pdf'
| PARAMETER | DESCRIPTION |
|---|---|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
image_size |
images input size. Default: 224.
TYPE:
|
patch_size |
image patch size. Default: 16.
TYPE:
|
embed_dim |
embedding dimension in all head. Default: 48.
TYPE:
|
num_heads |
number of heads. Default: 12.
TYPE:
|
drop_rate |
dropout rate. Default: 0.
TYPE:
|
drop_path_rate |
drop path rate. Default: 0.1.
TYPE:
|
depth |
model block depth. Default: 12.
TYPE:
|
mlp_ratio |
ratio of hidden features in Mlp. Default: 4.
TYPE:
|
qkv_bias |
have bias in qkv layers or not. Default: False.
TYPE:
|
attn_drop_rate |
attention layers dropout rate. Default: 0.
TYPE:
|
locality_strength |
determines how focused each head is around its attention center. Default: 1.
TYPE:
|
local_up_to_layer |
number of GPSA layers. Default: 10.
TYPE:
|
use_pos_embed |
whether use the embeded position. Default: True.
TYPE:
|
locality_strength(float) |
the strength of locality. Default: 1.
|
Source code in mindocr\models\backbones\mindcv_models\convit.py
209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 | |
mindocr.models.backbones.mindcv_models.convit.convit_base(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ConViT base model Refer to the base class "models.ConViT" for more details.
Source code in mindocr\models\backbones\mindcv_models\convit.py
397 398 399 400 401 402 403 404 405 406 407 408 409 | |
mindocr.models.backbones.mindcv_models.convit.convit_base_plus(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ConViT base+ model Refer to the base class "models.ConViT" for more details.
Source code in mindocr\models\backbones\mindcv_models\convit.py
412 413 414 415 416 417 418 419 420 421 422 423 424 | |
mindocr.models.backbones.mindcv_models.convit.convit_small(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ConViT small model Refer to the base class "models.ConViT" for more details.
Source code in mindocr\models\backbones\mindcv_models\convit.py
367 368 369 370 371 372 373 374 375 376 377 378 379 | |
mindocr.models.backbones.mindcv_models.convit.convit_small_plus(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ConViT small+ model Refer to the base class "models.ConViT" for more details.
Source code in mindocr\models\backbones\mindcv_models\convit.py
382 383 384 385 386 387 388 389 390 391 392 393 394 | |
mindocr.models.backbones.mindcv_models.convit.convit_tiny(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ConViT tiny model Refer to the base class "models.ConViT" for more details.
Source code in mindocr\models\backbones\mindcv_models\convit.py
337 338 339 340 341 342 343 344 345 346 347 348 349 | |
mindocr.models.backbones.mindcv_models.convit.convit_tiny_plus(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ConViT tiny+ model Refer to the base class "models.ConViT" for more details.
Source code in mindocr\models\backbones\mindcv_models\convit.py
352 353 354 355 356 357 358 359 360 361 362 363 364 | |
mindocr.models.backbones.mindcv_models.convnext
¶MindSpore implementation of ConvNeXt.
Refer to: A ConvNet for the 2020s
mindocr.models.backbones.mindcv_models.convnext.Block
¶
Bases: nn.Cell
ConvNeXt Block
There are two equivalent implementations
(1) DwConv -> LayerNorm (channels_first) -> 1x1 Conv -> GELU -> 1x1 Conv; all in (N, C, H, W) (2) DwConv -> Permute to (N, H, W, C); LayerNorm (channels_last) -> Linear -> GELU -> Linear; Permute back
Unlike the official impl, this one allows choice of 1 or 2, 1x1 conv can be faster with appropriate choice of LayerNorm impl, however as model size increases the tradeoffs appear to change and nn.Linear is a better choice. This was observed with PyTorch 1.10 on 3090 GPU, it could change over time & w/ different HW.
| PARAMETER | DESCRIPTION |
|---|---|
dim |
Number of input channels.
TYPE:
|
drop_path |
Stochastic depth rate. Default: 0.0
TYPE:
|
layer_scale_init_value |
Init value for Layer Scale. Default: 1e-6.
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\convnext.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 | |
mindocr.models.backbones.mindcv_models.convnext.ConvNeXt
¶
Bases: nn.Cell
ConvNeXt model class, based on '"A ConvNet for the 2020s" https://arxiv.org/abs/2201.03545'
| PARAMETER | DESCRIPTION |
|---|---|
in_channels |
dim of the input channel.
TYPE:
|
num_classes |
dim of the classes predicted.
TYPE:
|
depths |
the depths of each layer.
TYPE:
|
dims |
the middle dim of each layer.
TYPE:
|
drop_path_rate |
the rate of droppath default : 0.
TYPE:
|
layer_scale_init_value |
the parameter of init for the classifier default : 1e-6.
TYPE:
|
head_init_scale |
the parameter of init for the head default : 1.
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\convnext.py
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 | |
mindocr.models.backbones.mindcv_models.convnext.ConvNextLayerNorm
¶
Bases: nn.LayerNorm
LayerNorm for channels_first tensors with 2d spatial dimensions (ie N, C, H, W).
Source code in mindocr\models\backbones\mindcv_models\convnext.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | |
mindocr.models.backbones.mindcv_models.convnext.convnext_base(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ConvNeXt base model. Refer to the base class 'models.ConvNeXt' for more details.
Source code in mindocr\models\backbones\mindcv_models\convnext.py
239 240 241 242 243 244 245 246 247 248 249 250 251 252 | |
mindocr.models.backbones.mindcv_models.convnext.convnext_large(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ConvNeXt large model. Refer to the base class 'models.ConvNeXt' for more details.
Source code in mindocr\models\backbones\mindcv_models\convnext.py
255 256 257 258 259 260 261 262 263 264 265 266 267 268 | |
mindocr.models.backbones.mindcv_models.convnext.convnext_small(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ConvNeXt small model. Refer to the base class 'models.ConvNeXt' for more details.
Source code in mindocr\models\backbones\mindcv_models\convnext.py
223 224 225 226 227 228 229 230 231 232 233 234 235 236 | |
mindocr.models.backbones.mindcv_models.convnext.convnext_tiny(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ConvNeXt tiny model. Refer to the base class 'models.ConvNeXt' for more details.
Source code in mindocr\models\backbones\mindcv_models\convnext.py
207 208 209 210 211 212 213 214 215 216 217 218 219 220 | |
mindocr.models.backbones.mindcv_models.convnext.convnext_xlarge(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ConvNeXt xlarge model. Refer to the base class 'models.ConvNeXt' for more details.
Source code in mindocr\models\backbones\mindcv_models\convnext.py
271 272 273 274 275 276 277 278 279 280 281 282 283 284 | |
mindocr.models.backbones.mindcv_models.crossvit
¶MindSpore implementation of crossvit.
Refer to crossvit: Cross-Attention Multi-Scale Vision Transformer for Image Classification
mindocr.models.backbones.mindcv_models.crossvit.PatchEmbed
¶
Bases: nn.Cell
Image to Patch Embedding
Source code in mindocr\models\backbones\mindcv_models\crossvit.py
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | |
mindocr.models.backbones.mindcv_models.crossvit.VisionTransformer
¶
Bases: nn.Cell
Vision Transformer with support for patch or hybrid CNN input stage
Source code in mindocr\models\backbones\mindcv_models\crossvit.py
305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 | |
mindocr.models.backbones.mindcv_models.densenet
¶MindSpore implementation of DenseNet.
Refer to: Densely Connected Convolutional Networks
mindocr.models.backbones.mindcv_models.densenet.DenseNet
¶
Bases: nn.Cell
Densenet-BC model class, based on
"Densely Connected Convolutional Networks" <https://arxiv.org/pdf/1608.06993.pdf>_
| PARAMETER | DESCRIPTION |
|---|---|
growth_rate |
how many filters to add each layer (
TYPE:
|
block_config |
how many layers in each pooling block. Default: (6, 12, 24, 16).
TYPE:
|
num_init_features |
number of filters in the first Conv2d. Default: 64.
TYPE:
|
bn_size |
multiplicative factor for number of bottleneck layers (i.e. bn_size * k features in the bottleneck layer). Default: 4.
TYPE:
|
drop_rate |
dropout rate after each dense layer. Default: 0.
TYPE:
|
in_channels |
number of input channels. Default: 3.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\densenet.py
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 | |
mindocr.models.backbones.mindcv_models.densenet.densenet121(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 121 layers DenseNet model.
Refer to the base class models.DenseNet for more details.
Source code in mindocr\models\backbones\mindcv_models\densenet.py
224 225 226 227 228 229 230 231 232 233 234 235 | |
mindocr.models.backbones.mindcv_models.densenet.densenet161(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 161 layers DenseNet model.
Refer to the base class models.DenseNet for more details.
Source code in mindocr\models\backbones\mindcv_models\densenet.py
238 239 240 241 242 243 244 245 246 247 248 249 | |
mindocr.models.backbones.mindcv_models.densenet.densenet169(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 169 layers DenseNet model.
Refer to the base class models.DenseNet for more details.
Source code in mindocr\models\backbones\mindcv_models\densenet.py
252 253 254 255 256 257 258 259 260 261 262 263 | |
mindocr.models.backbones.mindcv_models.densenet.densenet201(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 201 layers DenseNet model.
Refer to the base class models.DenseNet for more details.
Source code in mindocr\models\backbones\mindcv_models\densenet.py
266 267 268 269 270 271 272 273 274 275 276 277 | |
mindocr.models.backbones.mindcv_models.download
¶Utility of downloading
mindocr.models.backbones.mindcv_models.download.DownLoad
¶Base utility class for downloading.
Source code in mindocr\models\backbones\mindcv_models\download.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 | |
mindocr.models.backbones.mindcv_models.download.DownLoad.calculate_md5(file_path, chunk_size=1024 * 1024)
staticmethod
¶Calculate md5 value.
Source code in mindocr\models\backbones\mindcv_models\download.py
43 44 45 46 47 48 49 50 | |
mindocr.models.backbones.mindcv_models.download.DownLoad.check_md5(file_path, md5=None)
¶Check md5 value.
Source code in mindocr\models\backbones\mindcv_models\download.py
52 53 54 | |
mindocr.models.backbones.mindcv_models.download.DownLoad.download_and_extract_archive(url, download_path=None, extract_path=None, filename=None, md5=None, remove_finished=False)
¶Download and extract archive.
Source code in mindocr\models\backbones\mindcv_models\download.py
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 | |
mindocr.models.backbones.mindcv_models.download.DownLoad.download_file(url, file_path, chunk_size=1024)
¶Download a file.
Source code in mindocr\models\backbones\mindcv_models\download.py
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
mindocr.models.backbones.mindcv_models.download.DownLoad.download_url(url, path=None, filename=None, md5=None)
¶Download a file from a url and place it in root.
Source code in mindocr\models\backbones\mindcv_models\download.py
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | |
mindocr.models.backbones.mindcv_models.download.DownLoad.extract_archive(from_path, to_path=None)
¶Extract and archive from path to path.
Source code in mindocr\models\backbones\mindcv_models\download.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | |
mindocr.models.backbones.mindcv_models.download.DownLoad.extract_tar(from_path, to_path=None, compression=None)
staticmethod
¶Extract tar format file.
Source code in mindocr\models\backbones\mindcv_models\download.py
56 57 58 59 60 61 | |
mindocr.models.backbones.mindcv_models.download.DownLoad.extract_zip(from_path, to_path=None, compression=None)
staticmethod
¶Extract zip format file.
Source code in mindocr\models\backbones\mindcv_models\download.py
63 64 65 66 67 68 69 | |
mindocr.models.backbones.mindcv_models.dpn
¶MindSpore implementation of DPN.
Refer to: Dual Path Networks
mindocr.models.backbones.mindcv_models.dpn.BottleBlock
¶
Bases: nn.Cell
A block for the Dual Path Architecture
Source code in mindocr\models\backbones\mindcv_models\dpn.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | |
mindocr.models.backbones.mindcv_models.dpn.DPN
¶
Bases: nn.Cell
DPN model class, based on
"Dual Path Networks" <https://arxiv.org/pdf/1707.01629.pdf>_
| PARAMETER | DESCRIPTION |
|---|---|
num_init_channel |
int type, the output channel of first blocks. Default: 64.
TYPE:
|
k_r |
int type, the first channel of each stage. Default: 96.
TYPE:
|
g |
int type,number of group in the conv2d. Default: 32.
TYPE:
|
k_sec |
multiplicative factor for number of bottleneck layers. Default: 4.
TYPE:
|
inc_sec |
the first output channel in each stage. Default: (16, 32, 24, 128).
TYPE:
|
in_channels |
int type, number of input channels. Default: 3.
TYPE:
|
num_classes |
int type, number of classification classes. Default: 1000.
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\dpn.py
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 | |
mindocr.models.backbones.mindcv_models.dpn.DualPathBlock
¶
Bases: nn.Cell
A block for Dual Path Networks to combine proj, residual and densely network
Source code in mindocr\models\backbones\mindcv_models\dpn.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 | |
mindocr.models.backbones.mindcv_models.dpn.dpn107(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 107 layers DPN model.
Refer to the base class models.DPN for more details.
Source code in mindocr\models\backbones\mindcv_models\dpn.py
304 305 306 307 308 309 310 311 312 313 314 315 | |
mindocr.models.backbones.mindcv_models.dpn.dpn131(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 131 layers DPN model.
Refer to the base class models.DPN for more details.
Source code in mindocr\models\backbones\mindcv_models\dpn.py
290 291 292 293 294 295 296 297 298 299 300 301 | |
mindocr.models.backbones.mindcv_models.dpn.dpn92(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 92 layers DPN model.
Refer to the base class models.DPN for more details.
Source code in mindocr\models\backbones\mindcv_models\dpn.py
262 263 264 265 266 267 268 269 270 271 272 273 | |
mindocr.models.backbones.mindcv_models.dpn.dpn98(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 98 layers DPN model.
Refer to the base class models.DPN for more details.
Source code in mindocr\models\backbones\mindcv_models\dpn.py
276 277 278 279 280 281 282 283 284 285 286 287 | |
mindocr.models.backbones.mindcv_models.edgenext
¶MindSpore implementation of edgenext.
Refer to EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications.
mindocr.models.backbones.mindcv_models.edgenext.EdgeNeXt
¶
Bases: nn.Cell
EdgeNeXt model class, based on
"Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision" <https://arxiv.org/abs/2206.10589>_
| PARAMETER | DESCRIPTION |
|---|---|
in_channels |
number of input channels. Default: 3
|
num_classes |
number of classification classes. Default: 1000
DEFAULT:
|
depths |
the depths of each layer. Default: [0, 0, 0, 3]
DEFAULT:
|
dims |
the middle dim of each layer. Default: [24, 48, 88, 168]
DEFAULT:
|
global_block |
number of global block. Default: [0, 0, 0, 3]
DEFAULT:
|
global_block_type |
type of global block. Default: ['None', 'None', 'None', 'SDTA']
DEFAULT:
|
drop_path_rate |
Stochastic Depth. Default: 0.
DEFAULT:
|
layer_scale_init_value |
value of layer scale initialization. Default: 1e-6
DEFAULT:
|
head_init_scale |
scale of head initialization. Default: 1.
DEFAULT:
|
expan_ratio |
ratio of expansion. Default: 4
DEFAULT:
|
kernel_sizes |
kernel sizes of different stages. Default: [7, 7, 7, 7]
DEFAULT:
|
heads |
number of attention heads. Default: [8, 8, 8, 8]
DEFAULT:
|
use_pos_embd_xca |
use position embedding in xca or not. Default: [False, False, False, False]
DEFAULT:
|
use_pos_embd_global |
use position embedding globally or not. Default: False
DEFAULT:
|
d2_scales |
scales of splitting channels
DEFAULT:
|
Source code in mindocr\models\backbones\mindcv_models\edgenext.py
295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 | |
mindocr.models.backbones.mindcv_models.edgenext.LayerNorm
¶
Bases: nn.LayerNorm
LayerNorm for channels_first tensors with 2d spatial dimensions (ie N, C, H, W).
Source code in mindocr\models\backbones\mindcv_models\edgenext.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | |
mindocr.models.backbones.mindcv_models.edgenext.edgenext_base(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get edgenext_base model.
Refer to the base class models.EdgeNeXt for more details.
Source code in mindocr\models\backbones\mindcv_models\edgenext.py
470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 | |
mindocr.models.backbones.mindcv_models.edgenext.edgenext_small(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get edgenext_small model.
Refer to the base class models.EdgeNeXt for more details.
Source code in mindocr\models\backbones\mindcv_models\edgenext.py
448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 | |
mindocr.models.backbones.mindcv_models.edgenext.edgenext_x_small(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get edgenext_x_small model.
Refer to the base class models.EdgeNeXt for more details.
Source code in mindocr\models\backbones\mindcv_models\edgenext.py
425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 | |
mindocr.models.backbones.mindcv_models.edgenext.edgenext_xx_small(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get edgenext_xx_small model.
Refer to the base class models.EdgeNeXt for more details.
Source code in mindocr\models\backbones\mindcv_models\edgenext.py
402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 | |
mindocr.models.backbones.mindcv_models.efficientnet
¶EfficientNet Architecture.
mindocr.models.backbones.mindcv_models.efficientnet.EfficientNet
¶
Bases: nn.Cell
EfficientNet architecture.
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
arch |
The name of the model.
TYPE:
|
dropout_rate |
The dropout rate of efficientnet.
TYPE:
|
width_mult |
The ratio of the channel. Default: 1.0.
TYPE:
|
depth_mult |
The ratio of num_layers. Default: 1.0.
TYPE:
|
in_channels |
The input channels. Default: 3.
TYPE:
|
num_classes |
The number of class. Default: 1000.
TYPE:
|
inverted_residual_setting |
The settings of block. Default: None.
TYPE:
|
keep_prob |
The dropout rate of MBConv. Default: 0.2.
TYPE:
|
norm_layer |
The normalization layer. Default: None.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, 1000).
Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 | |
mindocr.models.backbones.mindcv_models.efficientnet.EfficientNet.construct(x)
¶construct
Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
440 441 442 443 | |
mindocr.models.backbones.mindcv_models.efficientnet.FusedMBConv
¶
Bases: nn.Cell
FusedMBConv
Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 | |
mindocr.models.backbones.mindcv_models.efficientnet.FusedMBConvConfig
¶
Bases: MBConvConfig
FusedMBConvConfig
Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
203 204 205 206 207 208 209 210 211 212 213 214 215 216 | |
mindocr.models.backbones.mindcv_models.efficientnet.MBConv
¶
Bases: nn.Cell
MBConv Module.
| PARAMETER | DESCRIPTION |
|---|---|
cnf |
The class which contains the parameters(in_channels, out_channels, nums_layers) and the functions which help calculate the parameters after multipling the expand_ratio.
TYPE:
|
keep_prob |
The dropout rate in MBConv. Default: 0.8.
TYPE:
|
norm |
The BatchNorm Method. Default: None.
TYPE:
|
se_layer |
The squeeze-excite Module. Default: SqueezeExcite.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
Tensor |
Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 | |
mindocr.models.backbones.mindcv_models.efficientnet.MBConvConfig
¶The Parameters of MBConv which need to multiply the expand_ration.
| PARAMETER | DESCRIPTION |
|---|---|
expand_ratio |
The Times of the num of out_channels with respect to in_channels.
TYPE:
|
kernel_size |
The kernel size of the depthwise conv.
TYPE:
|
stride |
The stride of the depthwise conv.
TYPE:
|
in_chs |
The input_channels of the MBConv Module.
TYPE:
|
out_chs |
The output_channels of the MBConv Module.
TYPE:
|
num_layers |
The num of MBConv Module.
TYPE:
|
width_cnf |
The ratio of the channel. Default: 1.0.
TYPE:
|
depth_cnf |
The ratio of num_layers. Default: 1.0.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
Examples:
>>> cnf = MBConvConfig(1, 3, 1, 32, 16, 1)
>>> print(cnf.input_channels)
Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | |
mindocr.models.backbones.mindcv_models.efficientnet.MBConvConfig.adjust_channels(channels, width_cnf, min_value=None)
staticmethod
¶Calculate the width of MBConv.
| PARAMETER | DESCRIPTION |
|---|---|
channels |
The number of channel.
TYPE:
|
width_cnf |
The ratio of channel.
TYPE:
|
min_value |
The minimum number of channel. Default: None.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
int, the width of MBConv. |
Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | |
mindocr.models.backbones.mindcv_models.efficientnet.MBConvConfig.adjust_depth(num_layers, depth_cnf)
staticmethod
¶Calculate the depth of MBConv.
| PARAMETER | DESCRIPTION |
|---|---|
num_layers |
The number of MBConv Module.
TYPE:
|
depth_cnf |
The ratio of num_layers.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
int, the depth of MBConv. |
Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
120 121 122 123 124 125 126 127 128 129 130 131 132 133 | |
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_b0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Constructs a EfficientNet B0 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 | |
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_b1(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Constructs a EfficientNet B1 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 | |
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_b2(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Constructs a EfficientNet B2 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 | |
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_b3(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Constructs a EfficientNet B3 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 | |
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_b4(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Constructs a EfficientNet B4 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 | |
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_b5(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Constructs a EfficientNet B5 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 | |
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_b6(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Constructs a EfficientNet B6 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 | |
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_b7(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Constructs a EfficientNet B7 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 | |
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_v2_l(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Constructs a EfficientNet B4 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 | |
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_v2_m(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Constructs a EfficientNet B4 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 | |
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_v2_s(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Constructs a EfficientNet B4 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 | |
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_v2_xl(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Constructs a EfficientNet B4 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 | |
mindocr.models.backbones.mindcv_models.ghostnet
¶MindSpore implementation of GhostNet.
mindocr.models.backbones.mindcv_models.ghostnet.ConvBnAct
¶
Bases: nn.Cell
A block for conv bn and relu
Source code in mindocr\models\backbones\mindcv_models\ghostnet.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | |
mindocr.models.backbones.mindcv_models.ghostnet.GhostGate
¶
Bases: nn.Cell
Implementation for (relu6 + 3) / 6
Source code in mindocr\models\backbones\mindcv_models\ghostnet.py
37 38 39 40 41 42 43 44 45 | |
mindocr.models.backbones.mindcv_models.ghostnet.GhostNet
¶
Bases: nn.Cell
GhostNet model class, based on
"GhostNet: More Features from Cheap Operations " <https://arxiv.org/abs/1911.11907>_
| PARAMETER | DESCRIPTION |
|---|---|
cfgs |
the config of the GhostNet.
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
in_channels |
number of input channels. Default: 3.
TYPE:
|
width |
base width of hidden channel in blocks. Default: 1.0
TYPE:
|
droupout |
the probability of the features before classification. Default: 0.2
|
Source code in mindocr\models\backbones\mindcv_models\ghostnet.py
171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 | |
mindocr.models.backbones.mindcv_models.ghostnet.ghostnet_1x(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get GhostNet model. Refer to the base class 'models.GhostNet' for more details.
Source code in mindocr\models\backbones\mindcv_models\ghostnet.py
334 335 336 337 338 339 340 341 342 343 344 345 | |
mindocr.models.backbones.mindcv_models.ghostnet.ghostnet_nose_1x(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get GhostNet model without SEModule. Refer to the base class 'models.GhostNet' for more details.
Source code in mindocr\models\backbones\mindcv_models\ghostnet.py
348 349 350 351 352 353 354 355 356 357 358 359 | |
mindocr.models.backbones.mindcv_models.hrnet
¶MindSpore implementation of HRNet.
Refer to Deep High-Resolution Representation Learning for Visual Recognition
mindocr.models.backbones.mindcv_models.hrnet.BasicBlock
¶
Bases: nn.Cell
Basic block of HRNet
Source code in mindocr\models\backbones\mindcv_models\hrnet.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | |
mindocr.models.backbones.mindcv_models.hrnet.Bottleneck
¶
Bases: nn.Cell
Bottleneck block of HRNet
Source code in mindocr\models\backbones\mindcv_models\hrnet.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | |
mindocr.models.backbones.mindcv_models.hrnet.HRModule
¶
Bases: nn.Cell
High-Resolution Module for HRNet. In this module, every branch has 4 BasicBlocks/Bottlenecks. Fusion/Exchange is in this module.
Source code in mindocr\models\backbones\mindcv_models\hrnet.py
163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 | |
mindocr.models.backbones.mindcv_models.hrnet.HRNet
¶
Bases: nn.Cell
HRNet Backbone, based on
"Deep High-Resolution Representation Learning for Visual Recognition"
<https://arxiv.org/abs/1908.07919>_.
| PARAMETER | DESCRIPTION |
|---|---|
stage_cfg |
Configuration of the extra blocks. It accepts a dictionay
storing the detail config of each block. which include
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
in_channels |
Number the channels of the input. Default: 3.
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\hrnet.py
357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 | |
mindocr.models.backbones.mindcv_models.hrnet.HRNet.forward_features(x)
¶Perform the feature extraction.
| PARAMETER | DESCRIPTION |
|---|---|
x |
Tensor
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Extracted feature |
Source code in mindocr\models\backbones\mindcv_models\hrnet.py
620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 | |
mindocr.models.backbones.mindcv_models.hrnet.IdentityCell
¶
Bases: nn.Cell
Identity Cell
Source code in mindocr\models\backbones\mindcv_models\hrnet.py
34 35 36 37 38 39 40 41 | |
mindocr.models.backbones.mindcv_models.hrnet.hrnet_w32(pretrained=False, num_classes=1000, in_channels=3)
¶Get HRNet with width=32 model.
Refer to the base class models.HRNet for more details.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
Whether the model is pretrained. Default: False
TYPE:
|
num_classes |
number of classification classes. Default: 1000
TYPE:
|
in_channels |
Number of input channels. Default: 3
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
HRNet
|
HRNet model |
Source code in mindocr\models\backbones\mindcv_models\hrnet.py
684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 | |
mindocr.models.backbones.mindcv_models.hrnet.hrnet_w48(pretrained=False, num_classes=1000, in_channels=3)
¶Get HRNet with width=48 model.
Refer to the base class models.HRNet for more details.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
Whether the model is pretrained. Default: False
TYPE:
|
num_classes |
number of classification classes. Default: 1000
TYPE:
|
in_channels |
Number of input channels. Default: 3
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
HRNet
|
HRNet model |
Source code in mindocr\models\backbones\mindcv_models\hrnet.py
739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 | |
mindocr.models.backbones.mindcv_models.layers
¶layers init
mindocr.models.backbones.mindcv_models.layers.activation
¶Custom operators.
mindocr.models.backbones.mindcv_models.layers.activation.Swish
¶
Bases: nn.Cell
Swish activation function: x * sigmoid(x).
Return
Tensor
Example
x = Tensor(((20, 16), (50, 50)), mindspore.float32) Swish()(x)
Source code in mindocr\models\backbones\mindcv_models\layers\activation.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | |
mindocr.models.backbones.mindcv_models.layers.conv_norm_act
¶Conv2d + BN + Act
mindocr.models.backbones.mindcv_models.layers.conv_norm_act.Conv2dNormActivation
¶
Bases: nn.Cell
Conv2d + BN + Act
Source code in mindocr\models\backbones\mindcv_models\layers\conv_norm_act.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | |
mindocr.models.backbones.mindcv_models.layers.drop_path
¶DropPath Mindspore implementations of DropPath (Stochastic Depth) regularization layers. Papers: Deep Networks with Stochastic Depth (https://arxiv.org/abs/1603.09382)
mindocr.models.backbones.mindcv_models.layers.drop_path.DropPath
¶
Bases: nn.Cell
DropPath (Stochastic Depth) regularization layers
Source code in mindocr\models\backbones\mindcv_models\layers\drop_path.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | |
mindocr.models.backbones.mindcv_models.layers.helpers
¶Layer/Module Helpers
mindocr.models.backbones.mindcv_models.layers.identity
¶Identity Module
mindocr.models.backbones.mindcv_models.layers.identity.Identity
¶
Bases: nn.Cell
Identity
Source code in mindocr\models\backbones\mindcv_models\layers\identity.py
5 6 7 8 9 | |
mindocr.models.backbones.mindcv_models.layers.mlp
¶MLP module w/ dropout and configurable activation layer
mindocr.models.backbones.mindcv_models.layers.patch_embed
¶Image to Patch Embedding using Conv2d A convolution based approach to patchifying a 2D image w/ embedding projection.
mindocr.models.backbones.mindcv_models.layers.patch_embed.PatchEmbed
¶
Bases: nn.Cell
Image to Patch Embedding
| PARAMETER | DESCRIPTION |
|---|---|
image_size |
Image size. Default: 224.
TYPE:
|
patch_size |
Patch token size. Default: 4.
TYPE:
|
in_chans |
Number of input image channels. Default: 3.
TYPE:
|
embed_dim |
Number of linear projection output channels. Default: 96.
TYPE:
|
norm_layer |
Normalization layer. Default: None
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\layers\patch_embed.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | |
mindocr.models.backbones.mindcv_models.layers.patch_embed.PatchEmbed.construct(x)
¶docstring
Source code in mindocr\models\backbones\mindcv_models\layers\patch_embed.py
51 52 53 54 55 56 57 58 59 60 | |
mindocr.models.backbones.mindcv_models.layers.pooling
¶GlobalAvgPooling Module
mindocr.models.backbones.mindcv_models.layers.pooling.GlobalAvgPooling
¶
Bases: nn.Cell
GlobalAvgPooling, same as torch.nn.AdaptiveAvgPool2d when output shape is 1
Source code in mindocr\models\backbones\mindcv_models\layers\pooling.py
5 6 7 8 9 10 11 12 13 14 15 16 | |
mindocr.models.backbones.mindcv_models.layers.selective_kernel
¶Selective Kernel Convolution/Attention Paper: Selective Kernel Networks (https://arxiv.org/abs/1903.06586)
mindocr.models.backbones.mindcv_models.layers.selective_kernel.SelectiveKernel
¶
Bases: nn.Cell
Selective Kernel Convolution Module As described in Selective Kernel Networks (https://arxiv.org/abs/1903.06586) with some modifications. Largest change is the input split, which divides the input channels across each convolution path, this can be viewed as a grouping of sorts, but the output channel counts expand to the module level value. This keeps the parameter count from ballooning when the convolutions themselves don't have groups, but still provides a noteworthy increase in performance over similar param count models without this attention layer. -Ross W
| PARAMETER | DESCRIPTION |
|---|---|
in_channels |
module input (feature) channel count
TYPE:
|
out_channels |
module output (feature) channel count
TYPE:
|
kernel_size |
kernel size for each convolution branch
TYPE:
|
stride |
stride for convolutions
TYPE:
|
dilation |
dilation for module as a whole, impacts dilation of each branch
TYPE:
|
groups |
number of groups for each branch
TYPE:
|
rd_ratio |
reduction factor for attention features
TYPE:
|
rd_channels(int) |
reduction channels can be specified directly by arg (if rd_channels is set)
|
rd_divisor(int) |
divisor can be specified to keep channels
|
keep_3x3 |
keep all branch convolution kernels as 3x3, changing larger kernels for dilations
TYPE:
|
split_input |
split input channels evenly across each convolution branch, keeps param count lower, can be viewed as grouping by path, output expands to module out_channels count
TYPE:
|
activation |
activation layer to use
TYPE:
|
norm |
batchnorm/norm layer to use
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\layers\selective_kernel.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | |
mindocr.models.backbones.mindcv_models.layers.selective_kernel.SelectiveKernelAttn
¶
Bases: nn.Cell
Selective Kernel Attention Module Selective Kernel attention mechanism factored out into its own module.
Source code in mindocr\models\backbones\mindcv_models\layers\selective_kernel.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | |
mindocr.models.backbones.mindcv_models.layers.squeeze_excite
¶Squeeze-and-Excitation Channel Attention
An SE implementation originally based on PyTorch SE-Net impl.
Has since evolved with additional functionality / configuration.
Paper: Squeeze-and-Excitation Networks - https://arxiv.org/abs/1709.01507
mindocr.models.backbones.mindcv_models.layers.squeeze_excite.SqueezeExcite
¶
Bases: nn.Cell
SqueezeExcite Module as defined in original SE-Nets with a few additions.
Additions include
- divisor can be specified to keep channels % div == 0 (default: 8)
- reduction channels can be specified directly by arg (if rd_channels is set)
- reduction channels can be specified by float rd_ratio (default: 1/16)
- customizable activation, normalization, and gate layer
Source code in mindocr\models\backbones\mindcv_models\layers\squeeze_excite.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | |
mindocr.models.backbones.mindcv_models.layers.squeeze_excite.SqueezeExciteV2
¶
Bases: nn.Cell
SqueezeExcite Module as defined in original SE-Nets with a few additions. V1 uses 1x1conv to replace fc layers, and V2 uses nn.Dense to implement directly.
Source code in mindocr\models\backbones\mindcv_models\layers\squeeze_excite.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 | |
mindocr.models.backbones.mindcv_models.mixnet
¶MindSpore implementation of MixNet.
Refer to MixConv: Mixed Depthwise Convolutional Kernels
mindocr.models.backbones.mindcv_models.mixnet.MDConv
¶
Bases: nn.Cell
Mixed Depth-wise Convolution
Source code in mindocr\models\backbones\mindcv_models\mixnet.py
118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 | |
mindocr.models.backbones.mindcv_models.mixnet.MixNet
¶
Bases: nn.Cell
MixNet model class, based on
"MixConv: Mixed Depthwise Convolutional Kernels" <https://arxiv.org/abs/1907.09595>_
| PARAMETER | DESCRIPTION |
|---|---|
arch |
size of the architecture. "small", "medium" or "large". Default: "small".
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
in_channels |
number of the channels of the input. Default: 3.
TYPE:
|
feature_size |
numbet of the channels of the output features. Default: 1536.
TYPE:
|
drop_rate |
rate of dropout for classifier. Default: 0.2.
TYPE:
|
depth_multiplier |
expansion coefficient of channels. Default: 1.0.
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\mixnet.py
226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 | |
mindocr.models.backbones.mindcv_models.mixnet.MixNetBlock
¶
Bases: nn.Cell
Basic Block of MixNet
Source code in mindocr\models\backbones\mindcv_models\mixnet.py
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 | |
mindocr.models.backbones.mindcv_models.mlpmixer
¶MindSpore implementation of MLP-Mixer.
Refer to MLP-Mixer: An all-MLP Architecture for Vision.
mindocr.models.backbones.mindcv_models.mlpmixer.FeedForward
¶
Bases: nn.Cell
Feed Forward Block. MLP Layer. FC -> GELU -> FC
Source code in mindocr\models\backbones\mindcv_models\mlpmixer.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | |
mindocr.models.backbones.mindcv_models.mlpmixer.MLPMixer
¶
Bases: nn.Cell
MLP-Mixer model class, based on
"MLP-Mixer: An all-MLP Architecture for Vision" <https://arxiv.org/abs/2105.01601>_
| PARAMETER | DESCRIPTION |
|---|---|
depth |
number of MixerBlocks.
TYPE:
|
patch_size |
size of a single image patch.
TYPE:
|
n_patches |
number of patches.
TYPE:
|
n_channels |
channels(dimension) of a single embedded patch.
TYPE:
|
token_dim |
hidden dim of token-mixing MLP.
TYPE:
|
channel_dim |
hidden dim of channel-mixing MLP.
TYPE:
|
n_classes |
number of classification classes.
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\mlpmixer.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
mindocr.models.backbones.mindcv_models.mlpmixer.MixerBlock
¶
Bases: nn.Cell
Mixer Layer with token-mixing MLP and channel-mixing MLP
Source code in mindocr\models\backbones\mindcv_models\mlpmixer.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | |
mindocr.models.backbones.mindcv_models.mlpmixer.TransPose
¶
Bases: nn.Cell
TransPose Layer. Wrap operator Transpose for easy integration in nn.SequentialCell
Source code in mindocr\models\backbones\mindcv_models\mlpmixer.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | |
mindocr.models.backbones.mindcv_models.mnasnet
¶MindSpore implementation of MnasNet.
Refer to MnasNet: Platform-Aware Neural Architecture Search for Mobile.
mindocr.models.backbones.mindcv_models.mnasnet.Mnasnet
¶
Bases: nn.Cell
MnasNet model architecture from
"MnasNet: Platform-Aware Neural Architecture Search for Mobile" <https://arxiv.org/abs/1807.11626>_.
| PARAMETER | DESCRIPTION |
|---|---|
alpha |
scale factor of model width.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
drop_rate |
dropout rate of the layer before main classifier. Default: 0.2.
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\mnasnet.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 | |
mindocr.models.backbones.mindcv_models.mnasnet.mnasnet0_5(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MnasNet model with width scaled by 0.5.
Refer to the base class models.Mnasnet for more details.
Source code in mindocr\models\backbones\mindcv_models\mnasnet.py
179 180 181 182 183 184 185 186 187 188 189 | |
mindocr.models.backbones.mindcv_models.mnasnet.mnasnet0_75(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MnasNet model with width scaled by 0.75.
Refer to the base class models.Mnasnet for more details.
Source code in mindocr\models\backbones\mindcv_models\mnasnet.py
192 193 194 195 196 197 198 199 200 201 202 | |
mindocr.models.backbones.mindcv_models.mnasnet.mnasnet1_0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MnasNet model with width scaled by 1.0.
Refer to the base class models.Mnasnet for more details.
Source code in mindocr\models\backbones\mindcv_models\mnasnet.py
205 206 207 208 209 210 211 212 213 214 215 | |
mindocr.models.backbones.mindcv_models.mnasnet.mnasnet1_3(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MnasNet model with width scaled by 1.3.
Refer to the base class models.Mnasnet for more details.
Source code in mindocr\models\backbones\mindcv_models\mnasnet.py
218 219 220 221 222 223 224 225 226 227 228 | |
mindocr.models.backbones.mindcv_models.mnasnet.mnasnet1_4(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MnasNet model with width scaled by 1.4.
Refer to the base class models.Mnasnet for more details.
Source code in mindocr\models\backbones\mindcv_models\mnasnet.py
231 232 233 234 235 236 237 238 239 240 241 | |
mindocr.models.backbones.mindcv_models.mobilenet_v1
¶MindSpore implementation of MobileNetV1.
Refer to MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.
mindocr.models.backbones.mindcv_models.mobilenet_v1.MobileNetV1
¶
Bases: nn.Cell
MobileNetV1 model class, based on
"MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications" <https://arxiv.org/abs/1704.04861>_ # noqa: E501
| PARAMETER | DESCRIPTION |
|---|---|
alpha |
scale factor of model width. Default: 1.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v1.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | |
mindocr.models.backbones.mindcv_models.mobilenet_v1.mobilenet_v1_025_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV1 model with width scaled by 0.25.
Refer to the base class models.MobileNetV1 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v1.py
137 138 139 140 141 142 143 144 145 146 147 148 | |
mindocr.models.backbones.mindcv_models.mobilenet_v1.mobilenet_v1_050_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV1 model with width scaled by 0.5.
Refer to the base class models.MobileNetV1 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v1.py
151 152 153 154 155 156 157 158 159 160 161 162 | |
mindocr.models.backbones.mindcv_models.mobilenet_v1.mobilenet_v1_075_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV1 model with width scaled by 0.75.
Refer to the base class models.MobileNetV1 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v1.py
165 166 167 168 169 170 171 172 173 174 175 176 | |
mindocr.models.backbones.mindcv_models.mobilenet_v1.mobilenet_v1_100_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV1 model without width scaling.
Refer to the base class models.MobileNetV1 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v1.py
179 180 181 182 183 184 185 186 187 188 189 190 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2
¶MindSpore implementation of MobileNetV2.
Refer to MobileNetV2: Inverted Residuals and Linear Bottlenecks.
mindocr.models.backbones.mindcv_models.mobilenet_v2.InvertedResidual
¶
Bases: nn.Cell
Inverted Residual Block of MobileNetV2
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2.MobileNetV2
¶
Bases: nn.Cell
MobileNetV2 model class, based on
"MobileNetV2: Inverted Residuals and Linear Bottlenecks" <https://arxiv.org/abs/1801.04381>_
| PARAMETER | DESCRIPTION |
|---|---|
alpha |
scale factor of model width. Default: 1.
TYPE:
|
round_nearest |
divisor of make divisible function. Default: 8.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_035_128(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV2 model with width scaled by 0.35 and input image size of 128.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
541 542 543 544 545 546 547 548 549 550 551 552 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_035_160(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV2 model with width scaled by 0.35 and input image size of 160.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
527 528 529 530 531 532 533 534 535 536 537 538 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_035_192(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV2 model with width scaled by 0.35 and input image size of 192.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
513 514 515 516 517 518 519 520 521 522 523 524 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_035_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV2 model with width scaled by 0.35 and input image size of 224.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
499 500 501 502 503 504 505 506 507 508 509 510 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_035_96(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV2 model with width scaled by 0.35 and input image size of 96.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
555 556 557 558 559 560 561 562 563 564 565 566 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_050_128(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV2 model with width scaled by 0.5 and input image size of 128.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
471 472 473 474 475 476 477 478 479 480 481 482 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_050_160(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV2 model with width scaled by 0.5 and input image size of 160.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
457 458 459 460 461 462 463 464 465 466 467 468 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_050_192(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV2 model with width scaled by 0.5 and input image size of 192.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
443 444 445 446 447 448 449 450 451 452 453 454 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_050_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV2 model with width scaled by 0.5 and input image size of 224.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
429 430 431 432 433 434 435 436 437 438 439 440 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_050_96(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV2 model with width scaled by 0.5 and input image size of 96.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
485 486 487 488 489 490 491 492 493 494 495 496 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_075_128(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV2 model with width scaled by 0.75 and input image size of 128.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
401 402 403 404 405 406 407 408 409 410 411 412 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_075_160(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV2 model with width scaled by 0.75 and input image size of 160.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
387 388 389 390 391 392 393 394 395 396 397 398 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_075_192(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV2 model with width scaled by 0.75 and input image size of 192.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
373 374 375 376 377 378 379 380 381 382 383 384 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_075_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV2 model with width scaled by 0.75 and input image size of 224.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
359 360 361 362 363 364 365 366 367 368 369 370 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_075_96(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV2 model with width scaled by 0.75 and input image size of 96.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
415 416 417 418 419 420 421 422 423 424 425 426 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_100_128(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV2 model without width scaling and input image size of 128.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
331 332 333 334 335 336 337 338 339 340 341 342 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_100_160(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV2 model without width scaling and input image size of 160.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
317 318 319 320 321 322 323 324 325 326 327 328 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_100_192(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV2 model without width scaling and input image size of 192.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
303 304 305 306 307 308 309 310 311 312 313 314 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_100_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV2 model without width scaling and input image size of 224.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
289 290 291 292 293 294 295 296 297 298 299 300 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_100_96(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV2 model without width scaling and input image size of 96.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
345 346 347 348 349 350 351 352 353 354 355 356 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_130_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV2 model with width scaled by 1.3 and input image size of 224.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
275 276 277 278 279 280 281 282 283 284 285 286 | |
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_140_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get MobileNetV2 model with width scaled by 1.4 and input image size of 224.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
261 262 263 264 265 266 267 268 269 270 271 272 | |
mindocr.models.backbones.mindcv_models.mobilenet_v3
¶MindSpore implementation of MobileNetV3.
Refer to Searching for MobileNetV3.
mindocr.models.backbones.mindcv_models.mobilenet_v3.Bottleneck
¶
Bases: nn.Cell
Bottleneck Block of MobilenetV3. depth-wise separable convolutions + inverted residual + squeeze excitation
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v3.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | |
mindocr.models.backbones.mindcv_models.mobilenet_v3.MobileNetV3
¶
Bases: nn.Cell
MobileNetV3 model class, based on
"Searching for MobileNetV3" <https://arxiv.org/abs/1905.02244>_
| PARAMETER | DESCRIPTION |
|---|---|
arch |
size of the architecture. 'small' or 'large'.
TYPE:
|
alpha |
scale factor of model width. Default: 1.
TYPE:
|
round_nearest |
divisor of make divisible function. Default: 8.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v3.py
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 | |
mindocr.models.backbones.mindcv_models.mobilenet_v3.mobilenet_v3_large_075(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get large MobileNetV3 model with width scaled by 0.75.
Refer to the base class models.MobileNetV3 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v3.py
289 290 291 292 293 294 295 296 297 298 299 300 | |
mindocr.models.backbones.mindcv_models.mobilenet_v3.mobilenet_v3_large_100(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get large MobileNetV3 model without width scaling.
Refer to the base class models.MobileNetV3 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v3.py
261 262 263 264 265 266 267 268 269 270 271 272 | |
mindocr.models.backbones.mindcv_models.mobilenet_v3.mobilenet_v3_small_075(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get small MobileNetV3 model with width scaled by 0.75.
Refer to the base class models.MobileNetV3 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v3.py
275 276 277 278 279 280 281 282 283 284 285 286 | |
mindocr.models.backbones.mindcv_models.mobilenet_v3.mobilenet_v3_small_100(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get small MobileNetV3 model without width scaling.
Refer to the base class models.MobileNetV3 for more details.
Source code in mindocr\models\backbones\mindcv_models\mobilenet_v3.py
247 248 249 250 251 252 253 254 255 256 257 258 | |
mindocr.models.backbones.mindcv_models.model_factory
¶mindocr.models.backbones.mindcv_models.model_factory.create_model(model_name, num_classes=1000, pretrained=False, in_channels=3, checkpoint_path='', ema=False, features_only=False, out_indices=[0, 1, 2, 3, 4], **kwargs)
¶Creates model by name.
| PARAMETER | DESCRIPTION |
|---|---|
model_name |
The name of model.
TYPE:
|
num_classes |
The number of classes. Default: 1000.
TYPE:
|
pretrained |
Whether to load the pretrained model. Default: False.
TYPE:
|
in_channels |
The input channels. Default: 3.
TYPE:
|
checkpoint_path |
The path of checkpoint files. Default: "".
TYPE:
|
ema |
Whether use ema method. Default: False.
TYPE:
|
features_only |
Output the features at different strides instead. Default: False
TYPE:
|
out_indices |
The indicies of the output features when
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\model_factory.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | |
mindocr.models.backbones.mindcv_models.nasnet
¶MindSpore implementation of NasNet.
Refer to: Learning Transferable Architectures for Scalable Image Recognition
mindocr.models.backbones.mindcv_models.nasnet.BranchSeparables
¶
Bases: nn.Cell
NasNet model basic architecture
Source code in mindocr\models\backbones\mindcv_models\nasnet.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 | |
mindocr.models.backbones.mindcv_models.nasnet.BranchSeparablesReduction
¶
Bases: BranchSeparables
NasNet model Residual Connections
Source code in mindocr\models\backbones\mindcv_models\nasnet.py
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 | |
mindocr.models.backbones.mindcv_models.nasnet.BranchSeparablesStem
¶
Bases: nn.Cell
NasNet model basic architecture
Source code in mindocr\models\backbones\mindcv_models\nasnet.py
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 | |
mindocr.models.backbones.mindcv_models.nasnet.CellStem0
¶
Bases: nn.Cell
NasNet model basic architecture
Source code in mindocr\models\backbones\mindcv_models\nasnet.py
158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 | |
mindocr.models.backbones.mindcv_models.nasnet.CellStem1
¶
Bases: nn.Cell
NasNet model basic architecture
Source code in mindocr\models\backbones\mindcv_models\nasnet.py
226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 | |
mindocr.models.backbones.mindcv_models.nasnet.FirstCell
¶
Bases: nn.Cell
NasNet model basic architecture
Source code in mindocr\models\backbones\mindcv_models\nasnet.py
346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 | |
mindocr.models.backbones.mindcv_models.nasnet.NASNetAMobile
¶
Bases: nn.Cell
NasNet model class, based on
"Learning Transferable Architectures for Scalable Image Recognition" <https://arxiv.org/pdf/1707.07012v4.pdf>_
| PARAMETER | DESCRIPTION |
|---|---|
num_classes |
number of classification classes.
TYPE:
|
stem_filters |
number of stem filters. Default: 32.
TYPE:
|
penultimate_filters |
number of penultimate filters. Default: 1056.
TYPE:
|
filters_multiplier |
size of filters multiplier. Default: 2.
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\nasnet.py
680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 | |
mindocr.models.backbones.mindcv_models.nasnet.NASNetAMobile.forward_features(x)
¶Network forward feature extraction.
Source code in mindocr\models\backbones\mindcv_models\nasnet.py
833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 | |
mindocr.models.backbones.mindcv_models.nasnet.NormalCell
¶
Bases: nn.Cell
NasNet model basic architecture
Source code in mindocr\models\backbones\mindcv_models\nasnet.py
438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 | |
mindocr.models.backbones.mindcv_models.nasnet.ReductionCell0
¶
Bases: nn.Cell
NasNet model Residual Connections
Source code in mindocr\models\backbones\mindcv_models\nasnet.py
507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 | |
mindocr.models.backbones.mindcv_models.nasnet.ReductionCell1
¶
Bases: nn.Cell
NasNet model Residual Connections
Source code in mindocr\models\backbones\mindcv_models\nasnet.py
581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 | |
mindocr.models.backbones.mindcv_models.nasnet.SeparableConv2d
¶
Bases: nn.Cell
depth-wise convolutions + point-wise convolutions
Source code in mindocr\models\backbones\mindcv_models\nasnet.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | |
mindocr.models.backbones.mindcv_models.nasnet.nasnet_a_4x1056(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get NasNet model.
Refer to the base class models.NASNetAMobile for more details.
Source code in mindocr\models\backbones\mindcv_models\nasnet.py
873 874 875 876 877 878 879 880 881 | |
mindocr.models.backbones.mindcv_models.path
¶Utility of file path
mindocr.models.backbones.mindcv_models.path.detect_file_type(filename)
¶Detect file type by suffixes and return tuple(suffix, archive_type, compression).
Source code in mindocr\models\backbones\mindcv_models\path.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | |
mindocr.models.backbones.mindcv_models.pit
¶MindSpore implementation of PiT.
Refer to Rethinking Spatial Dimensions of Vision Transformers.
mindocr.models.backbones.mindcv_models.pit.Attention
¶
Bases: nn.Cell
define multi-head self attention block
Source code in mindocr\models\backbones\mindcv_models\pit.py
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | |
mindocr.models.backbones.mindcv_models.pit.Block
¶
Bases: nn.Cell
define the basic block of PiT
Source code in mindocr\models\backbones\mindcv_models\pit.py
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 | |
mindocr.models.backbones.mindcv_models.pit.Mlp
¶
Bases: nn.Cell
MLP as used in Vision Transformer, MLP-Mixer and related networks
Source code in mindocr\models\backbones\mindcv_models\pit.py
184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 | |
mindocr.models.backbones.mindcv_models.pit.PoolingTransformer
¶
Bases: nn.Cell
PiT model class, based on
"Rethinking Spatial Dimensions of Vision Transformers"
<https://arxiv.org/abs/2103.16302>
| PARAMETER | DESCRIPTION |
|---|---|
image_size |
images input size.
TYPE:
|
patch_size |
image patch size.
TYPE:
|
stride |
stride of the depthwise conv.
TYPE:
|
base_dims |
middle dim of each layer.
TYPE:
|
depth |
model block depth of each layer.
TYPE:
|
heads |
number of heads of multi-head attention of each layer
TYPE:
|
mlp_ratio |
ratio of hidden features in Mlp.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
in_chans |
number the channels of the input. Default: 3.
TYPE:
|
attn_drop_rate |
attention layers dropout rate. Default: 0.
TYPE:
|
drop_rate |
dropout rate. Default: 0.
TYPE:
|
drop_path_rate |
drop path rate. Default: 0.
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\pit.py
264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 | |
mindocr.models.backbones.mindcv_models.pit.Transformer
¶
Bases: nn.Cell
define the transformer block of PiT
Source code in mindocr\models\backbones\mindcv_models\pit.py
212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 | |
mindocr.models.backbones.mindcv_models.pit.conv_embedding
¶
Bases: nn.Cell
define embedding layer using conv2d
Source code in mindocr\models\backbones\mindcv_models\pit.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | |
mindocr.models.backbones.mindcv_models.pit.conv_head_pooling
¶
Bases: nn.Cell
define pooling layer using conv in spatial tokens with an additional fully-connected layer (to adjust the channel size to match the spatial tokens)
Source code in mindocr\models\backbones\mindcv_models\pit.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 | |
mindocr.models.backbones.mindcv_models.pit.pit_b(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get PiT-B model.
Refer to the base class models.PoolingTransformer for more details.
Source code in mindocr\models\backbones\mindcv_models\pit.py
426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 | |
mindocr.models.backbones.mindcv_models.pit.pit_s(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get PiT-S model.
Refer to the base class models.PoolingTransformer for more details.
Source code in mindocr\models\backbones\mindcv_models\pit.py
450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 | |
mindocr.models.backbones.mindcv_models.pit.pit_ti(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get PiT-Ti model.
Refer to the base class models.PoolingTransformer for more details.
Source code in mindocr\models\backbones\mindcv_models\pit.py
402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 | |
mindocr.models.backbones.mindcv_models.pit.pit_xs(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get PiT-XS model.
Refer to the base class models.PoolingTransformer for more details.
Source code in mindocr\models\backbones\mindcv_models\pit.py
474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 | |
mindocr.models.backbones.mindcv_models.poolformer
¶MindSpore implementation of poolformer.
Refer to PoolFormer: MetaFormer Is Actually What You Need for Vision.
mindocr.models.backbones.mindcv_models.poolformer.ConvMlp
¶
Bases: nn.Cell
MLP using 1x1 convs that keeps spatial dims
Source code in mindocr\models\backbones\mindcv_models\poolformer.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | |
mindocr.models.backbones.mindcv_models.poolformer.ConvMlp.cls_init_weights()
¶Initialize weights for cells.
Source code in mindocr\models\backbones\mindcv_models\poolformer.py
87 88 89 90 91 92 93 94 95 | |
mindocr.models.backbones.mindcv_models.poolformer.PatchEmbed
¶
Bases: nn.Cell
Patch Embedding that is implemented by a layer of conv. Input: tensor in shape [B, C, H, W] Output: tensor in shape [B, C, H/stride, W/stride]
Source code in mindocr\models\backbones\mindcv_models\poolformer.py
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | |
mindocr.models.backbones.mindcv_models.poolformer.PoolFormer
¶
Bases: nn.Cell
PoolFormer model class, based on
"MetaFormer Is Actually What You Need for Vision" <https://arxiv.org/pdf/2111.11418v3.pdf>_
| PARAMETER | DESCRIPTION |
|---|---|
layers |
number of blocks for the 4 stages
|
embed_dims |
the embedding dims for the 4 stages. Default: (64, 128, 320, 512)
DEFAULT:
|
mlp_ratios |
mlp ratios for the 4 stages. Default: (4, 4, 4, 4)
DEFAULT:
|
downsamples |
flags to apply downsampling or not. Default: (True, True, True, True)
DEFAULT:
|
pool_size |
the pooling size for the 4 stages. Default: 3
DEFAULT:
|
in_chans |
number of input channels. Default: 3
DEFAULT:
|
num_classes |
number of classes for the image classification. Default: 1000
DEFAULT:
|
global_pool |
define the types of pooling layer. Default: avg
DEFAULT:
|
norm_layer |
define the types of normalization. Default: nn.GroupNorm
DEFAULT:
|
act_layer |
define the types of activation. Default: nn.GELU
DEFAULT:
|
in_patch_size |
specify the patch embedding for the input image. Default: 7
DEFAULT:
|
in_stride |
specify the stride for the input image. Default: 4.
DEFAULT:
|
in_pad |
specify the pad for the input image. Default: 2.
DEFAULT:
|
down_patch_size |
specify the downsample. Default: 3.
DEFAULT:
|
down_stride |
specify the downsample (patch embed.). Default: 2.
DEFAULT:
|
down_pad |
specify the downsample (patch embed.). Default: 1.
DEFAULT:
|
drop_rate |
dropout rate of the layer before main classifier. Default: 0.
DEFAULT:
|
drop_path_rate |
Stochastic Depth. Default: 0.
DEFAULT:
|
layer_scale_init_value |
LayerScale. Default: 1e-5.
DEFAULT:
|
fork_feat |
whether output features of the 4 stages, for dense prediction. Default: False.
DEFAULT:
|
Source code in mindocr\models\backbones\mindcv_models\poolformer.py
203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 | |
mindocr.models.backbones.mindcv_models.poolformer.PoolFormer.cls_init_weights()
¶Initialize weights for cells.
Source code in mindocr\models\backbones\mindcv_models\poolformer.py
290 291 292 293 294 295 296 297 298 | |
mindocr.models.backbones.mindcv_models.poolformer.PoolFormerBlock
¶
Bases: nn.Cell
Implementation of one PoolFormer block.
Source code in mindocr\models\backbones\mindcv_models\poolformer.py
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 | |
mindocr.models.backbones.mindcv_models.poolformer.basic_blocks(dim, index, layers, pool_size=3, mlp_ratio=4.0, act_layer=nn.GELU, norm_layer=nn.GroupNorm, drop_rate=0.0, drop_path_rate=0.0, layer_scale_init_value=1e-05)
¶generate PoolFormer blocks for a stage
Source code in mindocr\models\backbones\mindcv_models\poolformer.py
177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 | |
mindocr.models.backbones.mindcv_models.poolformer.poolformer_m36(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get poolformer_m36 model.
Refer to the base class models.PoolFormer for more details.
Source code in mindocr\models\backbones\mindcv_models\poolformer.py
358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 | |
mindocr.models.backbones.mindcv_models.poolformer.poolformer_m48(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get poolformer_m48 model.
Refer to the base class models.PoolFormer for more details.
Source code in mindocr\models\backbones\mindcv_models\poolformer.py
378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 | |
mindocr.models.backbones.mindcv_models.poolformer.poolformer_s12(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get poolformer_s12 model.
Refer to the base class models.PoolFormer for more details.
Source code in mindocr\models\backbones\mindcv_models\poolformer.py
323 324 325 326 327 328 329 330 331 | |
mindocr.models.backbones.mindcv_models.poolformer.poolformer_s24(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get poolformer_s24 model.
Refer to the base class models.PoolFormer for more details.
Source code in mindocr\models\backbones\mindcv_models\poolformer.py
334 335 336 337 338 339 340 341 342 | |
mindocr.models.backbones.mindcv_models.poolformer.poolformer_s36(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get poolformer_s36 model.
Refer to the base class models.PoolFormer for more details.
Source code in mindocr\models\backbones\mindcv_models\poolformer.py
345 346 347 348 349 350 351 352 353 354 355 | |
mindocr.models.backbones.mindcv_models.pvt
¶MindSpore implementation of PVT.
Refer to PVT: Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
mindocr.models.backbones.mindcv_models.pvt.Attention
¶
Bases: nn.Cell
spatial-reduction attention (SRA)
Source code in mindocr\models\backbones\mindcv_models\pvt.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 | |
mindocr.models.backbones.mindcv_models.pvt.Block
¶
Bases: nn.Cell
Block with spatial-reduction attention (SRA) and feed forward
Source code in mindocr\models\backbones\mindcv_models\pvt.py
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | |
mindocr.models.backbones.mindcv_models.pvt.PatchEmbed
¶
Bases: nn.Cell
Image to Patch Embedding
Source code in mindocr\models\backbones\mindcv_models\pvt.py
137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 | |
mindocr.models.backbones.mindcv_models.pvt.PyramidVisionTransformer
¶
Bases: nn.Cell
Pyramid Vision Transformer model class, based on
"Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions" <https://arxiv.org/abs/2102.12122>_ # noqa: E501
| PARAMETER | DESCRIPTION |
|---|---|
img_size(int) |
size of a input image.
|
patch_size |
size of a single image patch.
TYPE:
|
in_chans |
number the channels of the input. Default: 3.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
embed_dims |
how many hidden dim in each PatchEmbed.
TYPE:
|
num_heads |
number of attention head in each stage.
TYPE:
|
mlp_ratios |
ratios of MLP hidden dims in each stage.
TYPE:
|
qkv_bias(bool) |
use bias in attention.
|
qk_scale(float) |
Scale multiplied by qk in attention(if not none), otherwise head_dim ** -0.5.
|
drop_rate(float) |
The drop rate for each block. Default: 0.0.
|
attn_drop_rate(float) |
The drop rate for attention. Default: 0.0.
|
drop_path_rate(float) |
The drop rate for drop path. Default: 0.0.
|
norm_layer(nn.Cell) |
Norm layer that will be used in blocks. Default: nn.LayerNorm.
|
depths |
number of Blocks.
TYPE:
|
sr_ratios(list) |
stride and kernel size of each attention.
|
num_stages(int) |
number of stage. Default: 4.
|
Source code in mindocr\models\backbones\mindcv_models\pvt.py
169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 | |
mindocr.models.backbones.mindcv_models.pvt.pvt_large(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get PVT large model Refer to the base class "models.PVT" for more details.
Source code in mindocr\models\backbones\mindcv_models\pvt.py
413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 | |
mindocr.models.backbones.mindcv_models.pvt.pvt_medium(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get PVT medium model Refer to the base class "models.PVT" for more details.
Source code in mindocr\models\backbones\mindcv_models\pvt.py
393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 | |
mindocr.models.backbones.mindcv_models.pvt.pvt_small(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get PVT small model Refer to the base class "models.PVT" for more details.
Source code in mindocr\models\backbones\mindcv_models\pvt.py
373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 | |
mindocr.models.backbones.mindcv_models.pvt.pvt_tiny(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get PVT tiny model Refer to the base class "models.PVT" for more details.
Source code in mindocr\models\backbones\mindcv_models\pvt.py
353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 | |
mindocr.models.backbones.mindcv_models.pvtv2
¶MindSpore implementation of PVTv2.
Refer to PVTv2: PVTv2: Improved Baselines with Pyramid Vision Transformer
mindocr.models.backbones.mindcv_models.pvtv2.Attention
¶
Bases: nn.Cell
Linear Spatial Reduction Attention
Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 | |
mindocr.models.backbones.mindcv_models.pvtv2.Block
¶
Bases: nn.Cell
Block with Linear Spatial Reduction Attention and Convolutional Feed-Forward
Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 | |
mindocr.models.backbones.mindcv_models.pvtv2.DWConv
¶
Bases: nn.Cell
Depthwise separable convolution
Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 | |
mindocr.models.backbones.mindcv_models.pvtv2.Mlp
¶
Bases: nn.Cell
MLP with depthwise separable convolution
Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | |
mindocr.models.backbones.mindcv_models.pvtv2.OverlapPatchEmbed
¶
Bases: nn.Cell
Overlapping Patch Embedding
Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 | |
mindocr.models.backbones.mindcv_models.pvtv2.PyramidVisionTransformerV2
¶
Bases: nn.Cell
Pyramid Vision Transformer V2 model class, based on
"PVTv2: Improved Baselines with Pyramid Vision Transformer" <https://arxiv.org/abs/2106.13797>_
| PARAMETER | DESCRIPTION |
|---|---|
img_size(int) |
size of a input image.
|
patch_size |
size of a single image patch.
TYPE:
|
in_chans |
number the channels of the input. Default: 3.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
embed_dims |
how many hidden dim in each PatchEmbed.
TYPE:
|
num_heads |
number of attention head in each stage.
TYPE:
|
mlp_ratios |
ratios of MLP hidden dims in each stage.
TYPE:
|
qkv_bias(bool) |
use bias in attention.
|
qk_scale(float) |
Scale multiplied by qk in attention(if not none), otherwise head_dim ** -0.5.
|
drop_rate(float) |
The drop rate for each block. Default: 0.0.
|
attn_drop_rate(float) |
The drop rate for attention. Default: 0.0.
|
drop_path_rate(float) |
The drop rate for drop path. Default: 0.0.
|
norm_layer(nn.Cell) |
Norm layer that will be used in blocks. Default: nn.LayerNorm.
|
depths |
number of Blocks.
TYPE:
|
sr_ratios(list) |
stride and kernel size of each attention.
|
num_stages(int) |
number of stage. Default: 4.
|
linear(bool) |
use linear SRA.
|
Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 | |
mindocr.models.backbones.mindcv_models.pvtv2.pvt_v2_b0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get PVTV2-b0 model Refer to the base class "models.PVTv2" for more details.
Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 | |
mindocr.models.backbones.mindcv_models.pvtv2.pvt_v2_b1(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get PVTV2-b1 model Refer to the base class "models.PVTv2" for more details.
Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 | |
mindocr.models.backbones.mindcv_models.pvtv2.pvt_v2_b2(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get PVTV2-b2 model Refer to the base class "models.PVTv2" for more details.
Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 | |
mindocr.models.backbones.mindcv_models.pvtv2.pvt_v2_b3(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get PVTV2-b3 model Refer to the base class "models.PVTv2" for more details.
Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 | |
mindocr.models.backbones.mindcv_models.pvtv2.pvt_v2_b4(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get PVTV2-b4 model Refer to the base class "models.PVTv2" for more details.
Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 | |
mindocr.models.backbones.mindcv_models.pvtv2.pvt_v2_b5(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get PVTV2-b5 model Refer to the base class "models.PVTv2" for more details.
Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 | |
mindocr.models.backbones.mindcv_models.registry
¶model registry and list
mindocr.models.backbones.mindcv_models.registry.get_pretrained_cfg_value(model_name, cfg_key)
¶Get a specific model default_cfg value by key. None if it doesn't exist.
Source code in mindocr\models\backbones\mindcv_models\registry.py
128 129 130 131 132 | |
mindocr.models.backbones.mindcv_models.registry.has_pretrained_cfg_key(model_name, cfg_key)
¶Query model default_cfgs for existence of a specific key.
Source code in mindocr\models\backbones\mindcv_models\registry.py
135 136 137 138 139 | |
mindocr.models.backbones.mindcv_models.registry.is_model(model_name)
¶Check if a model name exists
Source code in mindocr\models\backbones\mindcv_models\registry.py
85 86 87 88 89 | |
mindocr.models.backbones.mindcv_models.registry.is_model_in_modules(model_name, module_names)
¶Check if a model exists within a subset of modules
Source code in mindocr\models\backbones\mindcv_models\registry.py
107 108 109 110 111 112 113 114 115 | |
mindocr.models.backbones.mindcv_models.registry.list_modules()
¶Return list of module names that contain models / model entrypoints
Source code in mindocr\models\backbones\mindcv_models\registry.py
99 100 101 102 103 104 | |
mindocr.models.backbones.mindcv_models.registry.model_entrypoint(model_name)
¶Fetch a model entrypoint for specified model name
Source code in mindocr\models\backbones\mindcv_models\registry.py
92 93 94 95 96 | |
mindocr.models.backbones.mindcv_models.regnet
¶MindSpore implementation of RegNet.
Refer to: Designing Network Design Spaces
mindocr.models.backbones.mindcv_models.regnet.AnyHead
¶
Bases: nn.Cell
AnyNet head: optional conv, AvgPool, 1x1.
Source code in mindocr\models\backbones\mindcv_models\regnet.py
308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 | |
mindocr.models.backbones.mindcv_models.regnet.AnyNet
¶
Bases: nn.Cell
AnyNet model.
Source code in mindocr\models\backbones\mindcv_models\regnet.py
354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 | |
mindocr.models.backbones.mindcv_models.regnet.AnyStage
¶
Bases: nn.Cell
AnyNet stage (sequence of blocks w/ the same output shape).
Source code in mindocr\models\backbones\mindcv_models\regnet.py
291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 | |
mindocr.models.backbones.mindcv_models.regnet.BasicTransform
¶
Bases: nn.Cell
Basic transformation: [3x3 conv, BN, Relu] x2.
Source code in mindocr\models\backbones\mindcv_models\regnet.py
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 | |
mindocr.models.backbones.mindcv_models.regnet.BottleneckTransform
¶
Bases: nn.Cell
Bottleneck transformation: 1x1, 3x3 [+SE], 1x1.
Source code in mindocr\models\backbones\mindcv_models\regnet.py
230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 | |
mindocr.models.backbones.mindcv_models.regnet.RegNet
¶
Bases: AnyNet
RegNet model class, based on
"Designing Network Design Spaces" <https://arxiv.org/abs/2003.13678>_
Source code in mindocr\models\backbones\mindcv_models\regnet.py
469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 | |
mindocr.models.backbones.mindcv_models.regnet.RegNet.regnet_get_params(w_a, w_0, w_m, d, stride, bot_mul, group_w, stem_type, stem_w, block_type, head_w, num_classes, se_r)
staticmethod
¶Get AnyNet parameters that correspond to the RegNet.
Source code in mindocr\models\backbones\mindcv_models\regnet.py
474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 | |
mindocr.models.backbones.mindcv_models.regnet.ResBasicBlock
¶
Bases: nn.Cell
Residual basic block: x + f(x), f = basic transform.
Source code in mindocr\models\backbones\mindcv_models\regnet.py
213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 | |
mindocr.models.backbones.mindcv_models.regnet.ResBottleneckBlock
¶
Bases: nn.Cell
Residual bottleneck block: x + f(x), f = bottleneck transform.
Source code in mindocr\models\backbones\mindcv_models\regnet.py
262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 | |
mindocr.models.backbones.mindcv_models.regnet.ResBottleneckLinearBlock
¶
Bases: nn.Cell
Residual linear bottleneck block: x + f(x), f = bottleneck transform.
Source code in mindocr\models\backbones\mindcv_models\regnet.py
279 280 281 282 283 284 285 286 287 288 | |
mindocr.models.backbones.mindcv_models.regnet.ResStem
¶
Bases: nn.Cell
ResNet stem for ImageNet: 7x7, BN, AF, MaxPool.
Source code in mindocr\models\backbones\mindcv_models\regnet.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 | |
mindocr.models.backbones.mindcv_models.regnet.ResStemCifar
¶
Bases: nn.Cell
ResNet stem for CIFAR: 3x3, BN, AF.
Source code in mindocr\models\backbones\mindcv_models\regnet.py
120 121 122 123 124 125 126 127 128 129 130 131 132 133 | |
mindocr.models.backbones.mindcv_models.regnet.SimpleStem
¶
Bases: nn.Cell
Simple stem for ImageNet: 3x3, BN, AF.
Source code in mindocr\models\backbones\mindcv_models\regnet.py
154 155 156 157 158 159 160 161 162 163 164 165 166 167 | |
mindocr.models.backbones.mindcv_models.regnet.VanillaBlock
¶
Bases: nn.Cell
Vanilla block: [3x3 conv, BN, Relu] x2.
Source code in mindocr\models\backbones\mindcv_models\regnet.py
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | |
mindocr.models.backbones.mindcv_models.regnet.activation()
¶Helper for building an activation layer.
Source code in mindocr\models\backbones\mindcv_models\regnet.py
115 116 117 | |
mindocr.models.backbones.mindcv_models.regnet.adjust_block_compatibility(ws, bs, gs)
¶Adjusts the compatibility of widths, bottlenecks, and groups.
Source code in mindocr\models\backbones\mindcv_models\regnet.py
427 428 429 430 431 432 433 434 435 436 437 438 | |
mindocr.models.backbones.mindcv_models.regnet.conv2d(w_in, w_out, k, *, stride=1, groups=1, bias=False)
¶Helper for building a conv2d layer.
Source code in mindocr\models\backbones\mindcv_models\regnet.py
84 85 86 87 88 | |
mindocr.models.backbones.mindcv_models.regnet.gap2d(keep_dims=False)
¶Helper for building a gap2d layer.
Source code in mindocr\models\backbones\mindcv_models\regnet.py
105 106 107 | |
mindocr.models.backbones.mindcv_models.regnet.generate_regnet(w_a, w_0, w_m, d, q=8)
¶Generates per stage widths and depths from RegNet parameters.
Source code in mindocr\models\backbones\mindcv_models\regnet.py
441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 | |
mindocr.models.backbones.mindcv_models.regnet.generate_regnet_full(w_a, w_0, w_m, d, stride, bot_mul, group_w)
¶Generates per stage ws, ds, gs, bs, and ss from RegNet cfg.
Source code in mindocr\models\backbones\mindcv_models\regnet.py
459 460 461 462 463 464 465 466 | |
mindocr.models.backbones.mindcv_models.regnet.get_block_fun(block_type)
¶Retrieves the block function by name.
Source code in mindocr\models\backbones\mindcv_models\regnet.py
341 342 343 344 345 346 347 348 349 350 351 | |
mindocr.models.backbones.mindcv_models.regnet.get_stem_fun(stem_type)
¶Retrieves the stem function by name.
Source code in mindocr\models\backbones\mindcv_models\regnet.py
329 330 331 332 333 334 335 336 337 338 | |
mindocr.models.backbones.mindcv_models.regnet.linear(w_in, w_out, *, bias=False)
¶Helper for building a linear layer.
Source code in mindocr\models\backbones\mindcv_models\regnet.py
110 111 112 | |
mindocr.models.backbones.mindcv_models.regnet.norm2d(w_in, eps=1e-05, mom=0.9)
¶Helper for building a norm2d layer.
Source code in mindocr\models\backbones\mindcv_models\regnet.py
91 92 93 | |
mindocr.models.backbones.mindcv_models.regnet.pool2d(_w_in, k, *, stride=1)
¶Helper for building a pool2d layer.
Source code in mindocr\models\backbones\mindcv_models\regnet.py
96 97 98 99 100 101 102 | |
mindocr.models.backbones.mindcv_models.repmlp
¶MindSpore implementation of RepMLP.
Refer to RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality.
mindocr.models.backbones.mindcv_models.repmlp.FFNBlock
¶
Bases: nn.Cell
Common FFN layer
Source code in mindocr\models\backbones\mindcv_models\repmlp.py
238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 | |
mindocr.models.backbones.mindcv_models.repmlp.GlobalPerceptron
¶
Bases: nn.Cell
GlobalPerceptron Layers provides global information(One of the three components of RepMLPBlock)
Source code in mindocr\models\backbones\mindcv_models\repmlp.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | |
mindocr.models.backbones.mindcv_models.repmlp.RepMLPBlock
¶
Bases: nn.Cell
Basic RepMLPBlock Layer(compose of Global Perceptron, Channel Perceptron and Local Perceptron)
Source code in mindocr\models\backbones\mindcv_models\repmlp.py
110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 | |
mindocr.models.backbones.mindcv_models.repmlp.RepMLPNet
¶
Bases: nn.Cell
RepMLPNet model class, based on
"RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality" <https://arxiv.org/pdf/2112.11081v2.pdf>_
| PARAMETER | DESCRIPTION |
|---|---|
in_channels |
number of input channels. Default: 3.
DEFAULT:
|
num_classes |
number of classification classes. Default: 1000.
|
patch_size |
size of a single image patch. Default: (4, 4)
DEFAULT:
|
num_blocks |
number of blocks per stage. Default: (2,2,6,2)
DEFAULT:
|
channels |
number of in_channels(channels[stage_idx]) and out_channels(channels[stage_idx + 1]) per stage. Default: (192,384,768,1536)
DEFAULT:
|
hs |
height of picture per stage. Default: (64,32,16,8)
DEFAULT:
|
ws |
width of picture per stage. Default: (64,32,16,8)
DEFAULT:
|
sharesets_nums |
number of share sets per stage. Default: (4,8,16,32)
DEFAULT:
|
reparam_conv_k |
convolution kernel size in local Perceptron. Default: (3,)
DEFAULT:
|
globalperceptron_reduce |
Intermediate convolution output size (in_channal = inchannal, out_channel = in_channel/globalperceptron_reduce) in globalperceptron. Default: 4
DEFAULT:
|
use_checkpoint |
whether to use checkpoint
DEFAULT:
|
deploy |
whether to use bias
DEFAULT:
|
Source code in mindocr\models\backbones\mindcv_models\repmlp.py
276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 | |
mindocr.models.backbones.mindcv_models.repmlp.RepMLPNetUnit
¶
Bases: nn.Cell
Basic unit of RepMLPNet
Source code in mindocr\models\backbones\mindcv_models\repmlp.py
256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 | |
mindocr.models.backbones.mindcv_models.repmlp.RepMLPNet_B224(pretrained=False, image_size=224, num_classes=1000, in_channels=3, deploy=False, **kwargs)
¶Get RepMLPNet_B224 model.
Refer to the base class models.RepMLPNet for more details.
Source code in mindocr\models\backbones\mindcv_models\repmlp.py
419 420 421 422 423 424 425 426 427 428 429 430 431 432 | |
mindocr.models.backbones.mindcv_models.repmlp.RepMLPNet_B256(pretrained=False, image_size=256, num_classes=1000, in_channels=3, deploy=False, **kwargs)
¶Get RepMLPNet_B256 model.
Refer to the base class models.RepMLPNet for more details.
Source code in mindocr\models\backbones\mindcv_models\repmlp.py
435 436 437 438 439 440 441 442 443 444 445 446 447 448 | |
mindocr.models.backbones.mindcv_models.repmlp.RepMLPNet_D256(pretrained=False, image_size=256, num_classes=1000, in_channels=3, deploy=False, **kwargs)
¶Get RepMLPNet_D256 model.
Refer to the base class models.RepMLPNet for more details.
Source code in mindocr\models\backbones\mindcv_models\repmlp.py
451 452 453 454 455 456 457 458 459 460 461 462 463 464 | |
mindocr.models.backbones.mindcv_models.repmlp.RepMLPNet_L256(pretrained=False, image_size=256, num_classes=1000, in_channels=3, deploy=False, **kwargs)
¶Get RepMLPNet_L256 model.
Refer to the base class models.RepMLPNet for more details.
Source code in mindocr\models\backbones\mindcv_models\repmlp.py
467 468 469 470 471 472 473 474 475 476 477 478 479 480 | |
mindocr.models.backbones.mindcv_models.repmlp.RepMLPNet_T224(pretrained=False, image_size=224, num_classes=1000, in_channels=3, deploy=False, **kwargs)
¶Get RepMLPNet_T224 model.
Refer to the base class models.RepMLPNet for more details.
Source code in mindocr\models\backbones\mindcv_models\repmlp.py
386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 | |
mindocr.models.backbones.mindcv_models.repmlp.RepMLPNet_T256(pretrained=False, image_size=256, num_classes=1000, in_channels=3, deploy=False, **kwargs)
¶Get RepMLPNet_T256 model.
Refer to the base class models.RepMLPNet for more details.
Source code in mindocr\models\backbones\mindcv_models\repmlp.py
403 404 405 406 407 408 409 410 411 412 413 414 415 416 | |
mindocr.models.backbones.mindcv_models.repvgg
¶MindSpore implementation of RepVGG.
Refer to RepVGG: Making VGG_style ConvNets Great Again
mindocr.models.backbones.mindcv_models.repvgg.RepVGG
¶
Bases: nn.Cell
RepVGG model class, based on
"RepVGGBlock: An all-MLP Architecture for Vision" <https://arxiv.org/pdf/2101.03697>_
| PARAMETER | DESCRIPTION |
|---|---|
num_blocks |
number of RepVGGBlocks
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
width_multiplier |
the numbers of MLP Architecture.
TYPE:
|
override_group_map |
the numbers of MLP Architecture.
TYPE:
|
deploy |
use rbr_reparam block or not. Default: False
TYPE:
|
use_se |
use se_block or not. Default: False
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\repvgg.py
194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 | |
mindocr.models.backbones.mindcv_models.repvgg.RepVGGBlock
¶
Bases: nn.Cell
Basic Block of RepVGG
Source code in mindocr\models\backbones\mindcv_models\repvgg.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 | |
mindocr.models.backbones.mindcv_models.repvgg.RepVGGBlock.get_custom_l2()
¶This may improve the accuracy and facilitates quantization in some cases.
Source code in mindocr\models\backbones\mindcv_models\repvgg.py
110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 | |
mindocr.models.backbones.mindcv_models.repvgg.RepVGGBlock.switch_to_deploy()
¶Model_convert
Source code in mindocr\models\backbones\mindcv_models\repvgg.py
171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 | |
mindocr.models.backbones.mindcv_models.repvgg.repvgg_a0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get RepVGG model with num_blocks=[2, 4, 14, 1], width_multiplier=[0.75, 0.75, 0.75, 2.5].
Refer to the base class models.RepVGG for more details.
Source code in mindocr\models\backbones\mindcv_models\repvgg.py
281 282 283 284 285 286 287 288 289 290 291 292 | |
mindocr.models.backbones.mindcv_models.repvgg.repvgg_a1(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get RepVGG model with num_blocks=[2, 4, 14, 1], width_multiplier=[1.0, 1.0, 1.0, 2.5].
Refer to the base class models.RepVGG for more details.
Source code in mindocr\models\backbones\mindcv_models\repvgg.py
295 296 297 298 299 300 301 302 303 304 305 306 307 | |
mindocr.models.backbones.mindcv_models.repvgg.repvgg_a2(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get RepVGG model with num_blocks=[2, 4, 14, 1], width_multiplier=[1.5, 1.5, 1.5, 2.75].
Refer to the base class models.RepVGG for more details.
Source code in mindocr\models\backbones\mindcv_models\repvgg.py
310 311 312 313 314 315 316 317 318 319 320 321 322 | |
mindocr.models.backbones.mindcv_models.repvgg.repvgg_b0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get RepVGG model with num_blocks=[4, 6, 16, 1], width_multiplier=[1.0, 1.0, 1.0, 2.5].
Refer to the base class models.RepVGG for more details.
Source code in mindocr\models\backbones\mindcv_models\repvgg.py
325 326 327 328 329 330 331 332 333 334 335 336 337 | |
mindocr.models.backbones.mindcv_models.repvgg.repvgg_b1(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get RepVGG model with num_blocks=[4, 6, 16, 1], width_multiplier=[2.0, 2.0, 2.0, 4.0].
Refer to the base class models.RepVGG for more details.
Source code in mindocr\models\backbones\mindcv_models\repvgg.py
340 341 342 343 344 345 346 347 348 349 350 351 352 | |
mindocr.models.backbones.mindcv_models.repvgg.repvgg_b2(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get RepVGG model with num_blocks=[4, 6, 16, 1], width_multiplier=[2.5, 2.5, 2.5, 5.0].
Refer to the base class models.RepVGG for more details.
Source code in mindocr\models\backbones\mindcv_models\repvgg.py
355 356 357 358 359 360 361 362 363 364 365 366 367 | |
mindocr.models.backbones.mindcv_models.repvgg.repvgg_b3(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get RepVGG model with num_blocks=[4, 6, 16, 1], width_multiplier=[3.0, 3.0, 3.0, 5.0].
Refer to the base class models.RepVGG for more details.
Source code in mindocr\models\backbones\mindcv_models\repvgg.py
370 371 372 373 374 375 376 377 378 379 380 381 382 | |
mindocr.models.backbones.mindcv_models.repvgg.repvgg_model_convert(model, save_path=None, do_copy=True)
¶repvgg_model_convert
Source code in mindocr\models\backbones\mindcv_models\repvgg.py
385 386 387 388 389 390 391 392 393 394 | |
mindocr.models.backbones.mindcv_models.res2net
¶MindSpore implementation of Res2Net.
Refer to Res2Net: A New Multi-scale Backbone Architecture.
mindocr.models.backbones.mindcv_models.res2net.Res2Net
¶
Bases: nn.Cell
Res2Net model class, based on
"Res2Net: A New Multi-scale Backbone Architecture" <https://arxiv.org/abs/1904.01169>_
| PARAMETER | DESCRIPTION |
|---|---|
block |
block of resnet.
TYPE:
|
layer_nums |
number of layers of each stage.
TYPE:
|
version |
variety of Res2Net, 'res2net' or 'res2net_v1b'. Default: 'res2net'.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
groups |
number of groups for group conv in blocks. Default: 1.
TYPE:
|
base_width |
base width of pre group hidden channel in blocks. Default: 26.
TYPE:
|
scale |
scale factor of Bottle2neck. Default: 4.
DEFAULT:
|
norm |
normalization layer in blocks. Default: None.
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\res2net.py
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 | |
mindocr.models.backbones.mindcv_models.res2net.res2net101(pretrained=False, num_classes=1001, in_channels=3, **kwargs)
¶Get 101 layers Res2Net model.
Refer to the base class models.Res2Net for more details.
Source code in mindocr\models\backbones\mindcv_models\res2net.py
324 325 326 327 328 329 330 331 332 333 334 335 | |
mindocr.models.backbones.mindcv_models.res2net.res2net152(pretrained=False, num_classes=1001, in_channels=3, **kwargs)
¶Get 152 layers Res2Net model.
Refer to the base class models.Res2Net for more details.
Source code in mindocr\models\backbones\mindcv_models\res2net.py
338 339 340 341 342 343 344 345 346 347 348 349 | |
mindocr.models.backbones.mindcv_models.res2net.res2net50(pretrained=False, num_classes=1001, in_channels=3, **kwargs)
¶Get 50 layers Res2Net model.
Refer to the base class models.Res2Net for more details.
Source code in mindocr\models\backbones\mindcv_models\res2net.py
310 311 312 313 314 315 316 317 318 319 320 321 | |
mindocr.models.backbones.mindcv_models.resnest
¶MindSpore implementation of ResNeSt.
Refer to ResNeSt: Split-Attention Networks.
mindocr.models.backbones.mindcv_models.resnest.Bottleneck
¶
Bases: nn.Cell
ResNeSt Bottleneck
Source code in mindocr\models\backbones\mindcv_models\resnest.py
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 | |
mindocr.models.backbones.mindcv_models.resnest.ResNeSt
¶
Bases: nn.Cell
ResNeSt model class, based on
"ResNeSt: Split-Attention Networks" <https://arxiv.org/abs/2004.08955>_
| PARAMETER | DESCRIPTION |
|---|---|
block |
Class for the residual block. Option is Bottleneck.
TYPE:
|
layers |
Numbers of layers in each block.
TYPE:
|
radix |
Number of groups for Split-Attention conv. Default: 1.
TYPE:
|
group |
Number of groups for the conv in each bottleneck block. Default: 1.
TYPE:
|
bottleneck_width |
bottleneck channels factor. Default: 64.
TYPE:
|
num_classes |
Number of classification classes. Default: 1000.
TYPE:
|
dilated |
Applying dilation strategy to pretrained ResNeSt yielding a stride-8 model, typically used in Semantic Segmentation. Default: False.
TYPE:
|
dilation |
Number of dilation in the conv. Default: 1.
TYPE:
|
deep_stem |
three 3x3 convolution layers of widths stem_width, stem_width, stem_width * 2. Default: False.
TYPE:
|
stem_width |
number of channels in stem convolutions. Default: 64.
TYPE:
|
avg_down |
use avg pooling for projection skip connection between stages/downsample. Default: False.
TYPE:
|
avd |
use avg pooling before or after split-attention conv. Default: False.
TYPE:
|
avd_first |
use avg pooling before or after split-attention conv. Default: False.
TYPE:
|
drop_rate |
Drop probability for the Dropout layer. Default: 0.
TYPE:
|
norm_layer |
Normalization layer used in backbone network. Default: nn.BatchNorm2d.
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\resnest.py
224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 | |
mindocr.models.backbones.mindcv_models.resnest.SplitAttn
¶
Bases: nn.Cell
Split-Attention Conv2d
Source code in mindocr\models\backbones\mindcv_models\resnest.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | |
mindocr.models.backbones.mindcv_models.resnet
¶MindSpore implementation of ResNet.
Refer to Deep Residual Learning for Image Recognition.
mindocr.models.backbones.mindcv_models.resnet.BasicBlock
¶
Bases: nn.Cell
define the basic block of resnet
Source code in mindocr\models\backbones\mindcv_models\resnet.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | |
mindocr.models.backbones.mindcv_models.resnet.Bottleneck
¶
Bases: nn.Cell
Bottleneck here places the stride for downsampling at 3x3 convolution(self.conv2) as torchvision does, while original implementation places the stride at the first 1x1 convolution(self.conv1)
Source code in mindocr\models\backbones\mindcv_models\resnet.py
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | |
mindocr.models.backbones.mindcv_models.resnet.ResNet
¶
Bases: nn.Cell
ResNet model class, based on
"Deep Residual Learning for Image Recognition" <https://arxiv.org/abs/1512.03385>_
| PARAMETER | DESCRIPTION |
|---|---|
block |
block of resnet.
TYPE:
|
layers |
number of layers of each stage.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
groups |
number of groups for group conv in blocks. Default: 1.
TYPE:
|
base_width |
base width of pre group hidden channel in blocks. Default: 64.
TYPE:
|
norm |
normalization layer in blocks. Default: None.
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\resnet.py
163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 | |
mindocr.models.backbones.mindcv_models.resnet.ResNet.forward_features(x)
¶Network forward feature extraction.
Source code in mindocr\models\backbones\mindcv_models\resnet.py
281 282 283 284 285 286 287 288 289 290 291 292 | |
mindocr.models.backbones.mindcv_models.resnet.resnet101(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 101 layers ResNet model.
Refer to the base class models.ResNet for more details.
Source code in mindocr\models\backbones\mindcv_models\resnet.py
347 348 349 350 351 352 353 354 355 356 357 358 | |
mindocr.models.backbones.mindcv_models.resnet.resnet152(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 152 layers ResNet model.
Refer to the base class models.ResNet for more details.
Source code in mindocr\models\backbones\mindcv_models\resnet.py
361 362 363 364 365 366 367 368 369 370 371 372 | |
mindocr.models.backbones.mindcv_models.resnet.resnet18(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 18 layers ResNet model.
Refer to the base class models.ResNet for more details.
Source code in mindocr\models\backbones\mindcv_models\resnet.py
305 306 307 308 309 310 311 312 313 314 315 316 | |
mindocr.models.backbones.mindcv_models.resnet.resnet34(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 34 layers ResNet model.
Refer to the base class models.ResNet for more details.
Source code in mindocr\models\backbones\mindcv_models\resnet.py
319 320 321 322 323 324 325 326 327 328 329 330 | |
mindocr.models.backbones.mindcv_models.resnet.resnet50(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 50 layers ResNet model.
Refer to the base class models.ResNet for more details.
Source code in mindocr\models\backbones\mindcv_models\resnet.py
333 334 335 336 337 338 339 340 341 342 343 344 | |
mindocr.models.backbones.mindcv_models.resnet.resnext101_32x4d(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 101 layers ResNeXt model with 32 groups of GPConv.
Refer to the base class models.ResNet for more details.
Source code in mindocr\models\backbones\mindcv_models\resnet.py
390 391 392 393 394 395 396 397 398 399 400 401 402 | |
mindocr.models.backbones.mindcv_models.resnet.resnext101_64x4d(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 101 layers ResNeXt model with 64 groups of GPConv.
Refer to the base class models.ResNet for more details.
Source code in mindocr\models\backbones\mindcv_models\resnet.py
405 406 407 408 409 410 411 412 413 414 415 416 417 | |
mindocr.models.backbones.mindcv_models.resnet.resnext50_32x4d(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 50 layers ResNeXt model with 32 groups of GPConv.
Refer to the base class models.ResNet for more details.
Source code in mindocr\models\backbones\mindcv_models\resnet.py
375 376 377 378 379 380 381 382 383 384 385 386 387 | |
mindocr.models.backbones.mindcv_models.resnetv2
¶MindSpore implementation of ResNetV2.
Refer to Identity Mappings in Deep Residual Networks.
mindocr.models.backbones.mindcv_models.resnetv2.resnetv2_101(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 101 layers ResNetV2 model.
Refer to the base class models.ResNet for more details.
Source code in mindocr\models\backbones\mindcv_models\resnetv2.py
108 109 110 111 112 113 114 115 116 117 118 119 | |
mindocr.models.backbones.mindcv_models.resnetv2.resnetv2_50(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 50 layers ResNetV2 model.
Refer to the base class models.ResNet for more details.
Source code in mindocr\models\backbones\mindcv_models\resnetv2.py
94 95 96 97 98 99 100 101 102 103 104 105 | |
mindocr.models.backbones.mindcv_models.rexnet
¶MindSpore implementation of ReXNet.
Refer to ReXNet: Rethinking Channel Dimensions for Efficient Model Design.
mindocr.models.backbones.mindcv_models.rexnet.LinearBottleneck
¶
Bases: nn.Cell
LinearBottleneck
Source code in mindocr\models\backbones\mindcv_models\rexnet.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | |
mindocr.models.backbones.mindcv_models.rexnet.ReXNetV1
¶
Bases: nn.Cell
ReXNet model class, based on
"Rethinking Channel Dimensions for Efficient Model Design" <https://arxiv.org/abs/2007.00992>_
| PARAMETER | DESCRIPTION |
|---|---|
in_channels |
number of the input channels. Default: 3.
TYPE:
|
fi_channels |
number of the final channels. Default: 180.
TYPE:
|
initial_channels |
initialize inplanes. Default: 16.
TYPE:
|
width_mult |
The ratio of the channel. Default: 1.0.
TYPE:
|
depth_mult |
The ratio of num_layers. Default: 1.0.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
use_se |
use SENet in LinearBottleneck. Default: True.
TYPE:
|
se_ratio |
(float): SENet reduction ratio. Default 1/12.
DEFAULT:
|
drop_rate |
dropout ratio. Default: 0.2.
TYPE:
|
ch_div |
divisible by ch_div. Default: 1.
TYPE:
|
act_layer |
activation function in ConvNormAct. Default: nn.SiLU.
TYPE:
|
dw_act_layer |
activation function after dw_conv. Default: nn.ReLU6.
TYPE:
|
cls_useconv |
use conv in classification. Default: False.
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\rexnet.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 | |
mindocr.models.backbones.mindcv_models.rexnet.rexnet_x09(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ReXNet model with width multiplier of 0.9.
Refer to the base class models.ReXNetV1 for more details.
Source code in mindocr\models\backbones\mindcv_models\rexnet.py
269 270 271 272 273 274 | |
mindocr.models.backbones.mindcv_models.rexnet.rexnet_x10(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ReXNet model with width multiplier of 1.0.
Refer to the base class models.ReXNetV1 for more details.
Source code in mindocr\models\backbones\mindcv_models\rexnet.py
277 278 279 280 281 282 | |
mindocr.models.backbones.mindcv_models.rexnet.rexnet_x13(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ReXNet model with width multiplier of 1.3.
Refer to the base class models.ReXNetV1 for more details.
Source code in mindocr\models\backbones\mindcv_models\rexnet.py
285 286 287 288 289 290 | |
mindocr.models.backbones.mindcv_models.rexnet.rexnet_x15(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ReXNet model with width multiplier of 1.5.
Refer to the base class models.ReXNetV1 for more details.
Source code in mindocr\models\backbones\mindcv_models\rexnet.py
293 294 295 296 297 298 | |
mindocr.models.backbones.mindcv_models.rexnet.rexnet_x20(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ReXNet model with width multiplier of 2.0.
Refer to the base class models.ReXNetV1 for more details.
Source code in mindocr\models\backbones\mindcv_models\rexnet.py
301 302 303 304 305 306 | |
mindocr.models.backbones.mindcv_models.senet
¶MindSpore implementation of SENet.
Refer to Squeeze-and-Excitation Networks.
mindocr.models.backbones.mindcv_models.senet.Bottleneck
¶
Bases: nn.Cell
Define the base block class for [SEnet, SEResNet, SEResNext] bottlenecks
that implements construct method.
Source code in mindocr\models\backbones\mindcv_models\senet.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | |
mindocr.models.backbones.mindcv_models.senet.SEBottleneck
¶
Bases: Bottleneck
Define the Bottleneck for SENet154.
Source code in mindocr\models\backbones\mindcv_models\senet.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | |
mindocr.models.backbones.mindcv_models.senet.SENet
¶
Bases: nn.Cell
SENet model class, based on
"Squeeze-and-Excitation Networks" <https://arxiv.org/abs/1709.01507>_
| PARAMETER | DESCRIPTION |
|---|---|
block |
block class of SENet.
TYPE:
|
layers |
Number of residual blocks for 4 layers.
TYPE:
|
group |
Number of groups for the conv in each bottleneck block.
TYPE:
|
reduction |
Reduction ratio for Squeeze-and-Excitation modules.
TYPE:
|
drop_rate |
Drop probability for the Dropout layer. Default: 0.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
inplanes |
Number of input channels for layer1. Default: 64.
TYPE:
|
input3x3 |
If
TYPE:
|
downsample_kernel_size |
Kernel size for downsampling convolutions. Default: 1.
TYPE:
|
downsample_padding |
Padding for downsampling convolutions. Default: 0.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\senet.py
233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 | |
mindocr.models.backbones.mindcv_models.senet.SEResNeXtBottleneck
¶
Bases: Bottleneck
Define the ResNeXt bottleneck type C with a Squeeze-and-Excitation module.
Source code in mindocr\models\backbones\mindcv_models\senet.py
152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 | |
mindocr.models.backbones.mindcv_models.senet.SEResNetBlock
¶
Bases: nn.Cell
Define the basic block of resnet with a Squeeze-and-Excitation module.
Source code in mindocr\models\backbones\mindcv_models\senet.py
186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 | |
mindocr.models.backbones.mindcv_models.senet.SEResNetBottleneck
¶
Bases: Bottleneck
Define the ResNet bottleneck with a Squeeze-and-Excitation module, and the latter is used in the torchvision implementation of ResNet.
Source code in mindocr\models\backbones\mindcv_models\senet.py
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | |
mindocr.models.backbones.mindcv_models.shufflenetv1
¶MindSpore implementation of ShuffleNetV1.
Refer to ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
mindocr.models.backbones.mindcv_models.shufflenetv1.ShuffleNetV1
¶
Bases: nn.Cell
ShuffleNetV1 model class, based on
"ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices" <https://arxiv.org/abs/1707.01083>_ # noqa: E501
| PARAMETER | DESCRIPTION |
|---|---|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
in_channels |
number of input channels. Default: 3.
TYPE:
|
model_size |
scale factor which controls the number of channels. Default: '2.0x'.
TYPE:
|
group |
number of group for group convolution. Default: 3.
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\shufflenetv1.py
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 | |
mindocr.models.backbones.mindcv_models.shufflenetv1.ShuffleV1Block
¶
Bases: nn.Cell
Basic block of ShuffleNetV1. 1x1 GC -> CS -> 3x3 DWC -> 1x1 GC
Source code in mindocr\models\backbones\mindcv_models\shufflenetv1.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | |
mindocr.models.backbones.mindcv_models.shufflenetv1.shufflenet_v1_g3_x0_5(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ShuffleNetV1 model with width scaled by 0.5 and 3 groups of GPConv.
Refer to the base class models.ShuffleNetV1 for more details.
Source code in mindocr\models\backbones\mindcv_models\shufflenetv1.py
226 227 228 229 230 231 232 233 234 235 236 237 | |
mindocr.models.backbones.mindcv_models.shufflenetv1.shufflenet_v1_g3_x1_0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ShuffleNetV1 model with width scaled by 1.0 and 3 groups of GPConv.
Refer to the base class models.ShuffleNetV1 for more details.
Source code in mindocr\models\backbones\mindcv_models\shufflenetv1.py
240 241 242 243 244 245 246 247 248 249 250 251 | |
mindocr.models.backbones.mindcv_models.shufflenetv1.shufflenet_v1_g3_x1_5(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ShuffleNetV1 model with width scaled by 1.5 and 3 groups of GPConv.
Refer to the base class models.ShuffleNetV1 for more details.
Source code in mindocr\models\backbones\mindcv_models\shufflenetv1.py
254 255 256 257 258 259 260 261 262 263 264 265 | |
mindocr.models.backbones.mindcv_models.shufflenetv1.shufflenet_v1_g3_x2_0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ShuffleNetV1 model with width scaled by 2.0 and 3 groups of GPConv.
Refer to the base class models.ShuffleNetV1 for more details.
Source code in mindocr\models\backbones\mindcv_models\shufflenetv1.py
268 269 270 271 272 273 274 275 276 277 278 279 | |
mindocr.models.backbones.mindcv_models.shufflenetv1.shufflenet_v1_g8_x0_5(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ShuffleNetV1 model with width scaled by 0.5 and 8 groups of GPConv.
Refer to the base class models.ShuffleNetV1 for more details.
Source code in mindocr\models\backbones\mindcv_models\shufflenetv1.py
282 283 284 285 286 287 288 289 290 291 292 293 | |
mindocr.models.backbones.mindcv_models.shufflenetv1.shufflenet_v1_g8_x1_0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ShuffleNetV1 model with width scaled by 1.0 and 8 groups of GPConv.
Refer to the base class models.ShuffleNetV1 for more details.
Source code in mindocr\models\backbones\mindcv_models\shufflenetv1.py
296 297 298 299 300 301 302 303 304 305 306 307 | |
mindocr.models.backbones.mindcv_models.shufflenetv1.shufflenet_v1_g8_x1_5(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ShuffleNetV1 model with width scaled by 1.5 and 8 groups of GPConv.
Refer to the base class models.ShuffleNetV1 for more details.
Source code in mindocr\models\backbones\mindcv_models\shufflenetv1.py
310 311 312 313 314 315 316 317 318 319 320 321 | |
mindocr.models.backbones.mindcv_models.shufflenetv1.shufflenet_v1_g8_x2_0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ShuffleNetV1 model with width scaled by 2.0 and 8 groups of GPConv.
Refer to the base class models.ShuffleNetV1 for more details.
Source code in mindocr\models\backbones\mindcv_models\shufflenetv1.py
324 325 326 327 328 329 330 331 332 333 334 335 | |
mindocr.models.backbones.mindcv_models.shufflenetv2
¶MindSpore implementation of ShuffleNetV2.
Refer to ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
mindocr.models.backbones.mindcv_models.shufflenetv2.ShuffleNetV2
¶
Bases: nn.Cell
ShuffleNetV2 model class, based on
"ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design" <https://arxiv.org/abs/1807.11164>_
| PARAMETER | DESCRIPTION |
|---|---|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
in_channels |
number of input channels. Default: 3.
TYPE:
|
model_size |
scale factor which controls the number of channels. Default: '1.5x'.
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\shufflenetv2.py
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 | |
mindocr.models.backbones.mindcv_models.shufflenetv2.ShuffleV2Block
¶
Bases: nn.Cell
define the basic block of ShuffleV2
Source code in mindocr\models\backbones\mindcv_models\shufflenetv2.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | |
mindocr.models.backbones.mindcv_models.shufflenetv2.shufflenet_v2_x0_5(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ShuffleNetV2 model with width scaled by 0.5.
Refer to the base class models.ShuffleNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\shufflenetv2.py
220 221 222 223 224 225 226 227 228 229 230 231 | |
mindocr.models.backbones.mindcv_models.shufflenetv2.shufflenet_v2_x1_0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ShuffleNetV2 model with width scaled by 1.0.
Refer to the base class models.ShuffleNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\shufflenetv2.py
234 235 236 237 238 239 240 241 242 243 244 245 | |
mindocr.models.backbones.mindcv_models.shufflenetv2.shufflenet_v2_x1_5(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ShuffleNetV2 model with width scaled by 1.5.
Refer to the base class models.ShuffleNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\shufflenetv2.py
248 249 250 251 252 253 254 255 256 257 258 259 | |
mindocr.models.backbones.mindcv_models.shufflenetv2.shufflenet_v2_x2_0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get ShuffleNetV2 model with width scaled by 2.0.
Refer to the base class models.ShuffleNetV2 for more details.
Source code in mindocr\models\backbones\mindcv_models\shufflenetv2.py
262 263 264 265 266 267 268 269 270 271 272 273 | |
mindocr.models.backbones.mindcv_models.sknet
¶MindSpore implementation of SKNet.
Refer to Selective Kernel Networks.
mindocr.models.backbones.mindcv_models.sknet.SKNet
¶
Bases: ResNet
SKNet model class, based on
"Selective Kernel Networks" <https://arxiv.org/abs/1903.06586>_
| PARAMETER | DESCRIPTION |
|---|---|
block |
block of sknet.
TYPE:
|
layers |
number of layers of each stage.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
groups |
number of groups for group conv in blocks. Default: 1.
TYPE:
|
base_width |
base width of pre group hidden channel in blocks. Default: 64.
TYPE:
|
norm |
normalization layer in blocks. Default: None.
TYPE:
|
sk_kwargs |
kwargs of selective kernel. Default: None.
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\sknet.py
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 | |
mindocr.models.backbones.mindcv_models.sknet.SelectiveKernelBasic
¶
Bases: nn.Cell
build basic block of sknet
Source code in mindocr\models\backbones\mindcv_models\sknet.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | |
mindocr.models.backbones.mindcv_models.sknet.SelectiveKernelBottleneck
¶
Bases: nn.Cell
build the bottleneck of the sknet
Source code in mindocr\models\backbones\mindcv_models\sknet.py
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | |
mindocr.models.backbones.mindcv_models.sknet.skresnet18(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 18 layers SKNet model.
Refer to the base class models.SKNet for more details.
Source code in mindocr\models\backbones\mindcv_models\sknet.py
218 219 220 221 222 223 224 225 226 227 228 229 230 231 | |
mindocr.models.backbones.mindcv_models.sknet.skresnet34(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 34 layers SKNet model.
Refer to the base class models.SKNet for more details.
Source code in mindocr\models\backbones\mindcv_models\sknet.py
234 235 236 237 238 239 240 241 242 243 244 245 246 247 | |
mindocr.models.backbones.mindcv_models.sknet.skresnet50(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 50 layers SKNet model.
Refer to the base class models.SKNet for more details.
Source code in mindocr\models\backbones\mindcv_models\sknet.py
250 251 252 253 254 255 256 257 258 259 260 261 262 263 | |
mindocr.models.backbones.mindcv_models.sknet.skresnext50_32x4d(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 50 layers SKNeXt model with 32 groups of GPConv.
Refer to the base class models.SKNet for more details.
Source code in mindocr\models\backbones\mindcv_models\sknet.py
266 267 268 269 270 271 272 273 274 275 276 277 278 279 | |
mindocr.models.backbones.mindcv_models.squeezenet
¶MindSpore implementation of SqueezeNet.
Refer to SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size.
mindocr.models.backbones.mindcv_models.squeezenet.Fire
¶
Bases: nn.Cell
define the basic block of squeezenet
Source code in mindocr\models\backbones\mindcv_models\squeezenet.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | |
mindocr.models.backbones.mindcv_models.squeezenet.SqueezeNet
¶
Bases: nn.Cell
SqueezeNet model class, based on
"SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size" <https://arxiv.org/abs/1602.07360>_ # noqa: E501
.. note:: Important: In contrast to the other models the inception_v3 expects tensors with a size of N x 3 x 227 x 227, so ensure your images are sized accordingly.
| PARAMETER | DESCRIPTION |
|---|---|
version |
version of the architecture, '1_0' or '1_1'. Default: '1_0'.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
drop_rate |
dropout rate of the classifier. Default: 0.5.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\squeezenet.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | |
mindocr.models.backbones.mindcv_models.squeezenet.squeezenet1_0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get SqueezeNet model of version 1.0.
Refer to the base class models.SqueezeNet for more details.
Source code in mindocr\models\backbones\mindcv_models\squeezenet.py
152 153 154 155 156 157 158 159 160 161 162 163 | |
mindocr.models.backbones.mindcv_models.squeezenet.squeezenet1_1(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get SqueezeNet model of version 1.1.
Refer to the base class models.SqueezeNet for more details.
Source code in mindocr\models\backbones\mindcv_models\squeezenet.py
166 167 168 169 170 171 172 173 174 175 176 177 | |
mindocr.models.backbones.mindcv_models.swin_transformer
¶Define SwinTransformer model
mindocr.models.backbones.mindcv_models.swin_transformer.BasicLayer
¶
Bases: nn.Cell
A basic Swin Transformer layer for one stage.
| PARAMETER | DESCRIPTION |
|---|---|
dim |
Number of input channels.
TYPE:
|
input_resolution |
Input resolution.
TYPE:
|
depth |
Number of blocks.
TYPE:
|
num_heads |
Number of attention heads.
TYPE:
|
window_size |
Local window size.
TYPE:
|
mlp_ratio |
Ratio of mlp hidden dim to embedding dim.
TYPE:
|
qkv_bias |
If True, add a learnable bias to query, key, value. Default: True
TYPE:
|
qk_scale |
Override default qk scale of head_dim ** -0.5 if set.
TYPE:
|
drop |
Dropout rate. Default: 0.0
TYPE:
|
attn_drop |
Attention dropout rate. Default: 0.0
TYPE:
|
drop_path |
Stochastic depth rate. Default: 0.0
TYPE:
|
norm_layer |
Normalization layer. Default: nn.LayerNorm
TYPE:
|
downsample |
Downsample layer at the end of the layer. Default: None
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 | |
mindocr.models.backbones.mindcv_models.swin_transformer.PatchEmbed
¶
Bases: nn.Cell
Image to Patch Embedding
| PARAMETER | DESCRIPTION |
|---|---|
image_size |
Image size. Default: 224.
TYPE:
|
patch_size |
Patch token size. Default: 4.
TYPE:
|
in_chans |
Number of input image channels. Default: 3.
TYPE:
|
embed_dim |
Number of linear projection output channels. Default: 96.
TYPE:
|
norm_layer |
Normalization layer. Default: None
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 | |
mindocr.models.backbones.mindcv_models.swin_transformer.PatchMerging
¶
Bases: nn.Cell
Patch Merging Layer.
| PARAMETER | DESCRIPTION |
|---|---|
input_resolution |
Resolution of input feature.
TYPE:
|
dim |
Number of input channels.
TYPE:
|
norm_layer |
Normalization layer. Default: nn.LayerNorm
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 | |
mindocr.models.backbones.mindcv_models.swin_transformer.PatchMerging.construct(x)
¶Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
426 427 428 429 430 431 432 433 434 435 436 437 | |
mindocr.models.backbones.mindcv_models.swin_transformer.SwinTransformer
¶
Bases: nn.Cell
SwinTransformer model class, based on
"Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" <https://arxiv.org/pdf/2103.14030>_
| PARAMETER | DESCRIPTION |
|---|---|
image_size |
Input image size. Default 224
TYPE:
|
patch_size |
Patch size. Default: 4
TYPE:
|
in_chans |
Number of input image channels. Default: 3
TYPE:
|
num_classes |
Number of classes for classification head. Default: 1000
TYPE:
|
embed_dim |
Patch embedding dimension. Default: 96
TYPE:
|
depths |
Depth of each Swin Transformer layer.
TYPE:
|
num_heads |
Number of attention heads in different layers.
TYPE:
|
window_size |
Window size. Default: 7
TYPE:
|
mlp_ratio |
Ratio of mlp hidden dim to embedding dim. Default: 4
TYPE:
|
qkv_bias |
If True, add a learnable bias to query, key, value. Default: True
TYPE:
|
qk_scale |
Override default qk scale of head_dim ** -0.5 if set. Default: None
TYPE:
|
drop_rate |
Dropout rate. Default: 0
TYPE:
|
attn_drop_rate |
Attention dropout rate. Default: 0
TYPE:
|
drop_path_rate |
Stochastic depth rate. Default: 0.1
TYPE:
|
norm_layer |
Normalization layer. Default: nn.LayerNorm.
TYPE:
|
ape |
If True, add absolute position embedding to the patch embedding. Default: False
TYPE:
|
patch_norm |
If True, add normalization after patch embedding. Default: True
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 | |
mindocr.models.backbones.mindcv_models.swin_transformer.SwinTransformerBlock
¶
Bases: nn.Cell
Swin Transformer Block.
| PARAMETER | DESCRIPTION |
|---|---|
dim |
Number of input channels.
TYPE:
|
input_resolution |
Input resolution.
TYPE:
|
num_heads |
Number of attention heads.
TYPE:
|
window_size |
Window size.
TYPE:
|
shift_size |
Shift size for SW-MSA.
TYPE:
|
mlp_ratio |
Ratio of mlp hidden dim to embedding dim.
TYPE:
|
qkv_bias |
If True, add a learnable bias to query, key, value. Default: True
TYPE:
|
qk_scale |
Override default qk scale of head_dim ** -0.5 if set.
TYPE:
|
drop |
Dropout rate. Default: 0.0
TYPE:
|
attn_drop |
Attention dropout rate. Default: 0.0
TYPE:
|
drop_path |
Stochastic depth rate. Default: 0.0
TYPE:
|
act_layer |
Activation layer. Default: nn.GELU
TYPE:
|
norm_layer |
Normalization layer. Default: nn.LayerNorm
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 | |
mindocr.models.backbones.mindcv_models.swin_transformer.WindowAttention
¶
Bases: nn.Cell
Window based multi-head self attention (W-MSA) Cell with relative position bias. It supports both of shifted and non-shifted window.
| PARAMETER | DESCRIPTION |
|---|---|
dim |
Number of input channels.
TYPE:
|
window_size |
The height and width of the window.
TYPE:
|
num_heads |
Number of attention heads.
TYPE:
|
qkv_bias |
If True, add a learnable bias to query, key, value. Default: True
TYPE:
|
qZk_scale |
Override default qk scale of head_dim ** -0.5 if set
TYPE:
|
attn_drop |
Dropout ratio of attention weight. Default: 0.0
TYPE:
|
proj_drop |
Dropout ratio of output. Default: 0.0
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 | |
mindocr.models.backbones.mindcv_models.swin_transformer.WindowAttention.construct(x, mask=None)
¶| PARAMETER | DESCRIPTION |
|---|---|
x |
input features with shape of (num_windows*B, N, C)
TYPE:
|
mask |
(0/-inf) mask with shape of (num_windows, Wh*Ww, Wh*Ww) or None
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 | |
mindocr.models.backbones.mindcv_models.swin_transformer.WindowPartition
¶
Bases: nn.Cell
Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | |
mindocr.models.backbones.mindcv_models.swin_transformer.WindowPartition.construct(x)
¶| PARAMETER | DESCRIPTION |
|---|---|
x |
(b, h, w, c)
TYPE:
|
window_size |
window size
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
windows
|
Tensor(num_windows*b, window_size, window_size, c)
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | |
mindocr.models.backbones.mindcv_models.swin_transformer.WindowReverse
¶
Bases: nn.Cell
Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 | |
mindocr.models.backbones.mindcv_models.swin_transformer.WindowReverse.construct(windows, window_size, h, w)
¶| PARAMETER | DESCRIPTION |
|---|---|
windows |
(num_windows*B, window_size, window_size, C)
TYPE:
|
window_size |
Window size
TYPE:
|
h |
Height of image
TYPE:
|
w |
Width of image
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
x
|
(B, H, W, C)
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 | |
mindocr.models.backbones.mindcv_models.swin_transformer.swin_tiny(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get SwinTransformer tiny model. Refer to the base class 'models.SwinTransformer' for more details.
Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 | |
mindocr.models.backbones.mindcv_models.swin_transformer.window_partition(x, window_size)
¶| PARAMETER | DESCRIPTION |
|---|---|
x |
(B, H, W, C)
|
window_size |
window size
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
windows
|
numpy(num_windows*B, window_size, window_size, C) |
Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
64 65 66 67 68 69 70 71 72 73 74 75 76 | |
mindocr.models.backbones.mindcv_models.utils
¶Some utils while building models
mindocr.models.backbones.mindcv_models.utils.ConfigDict
¶
Bases: dict
dot.notation access to dictionary attributes
Source code in mindocr\models\backbones\mindcv_models\utils.py
21 22 23 24 25 26 | |
mindocr.models.backbones.mindcv_models.utils.auto_map(model, param_dict)
¶Raname part of the param_dict such that names from checkpoint and model are consistent
Source code in mindocr\models\backbones\mindcv_models\utils.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | |
mindocr.models.backbones.mindcv_models.utils.download_pretrained(default_cfg)
¶Download the pretrained ckpt from url to local path
Source code in mindocr\models\backbones\mindcv_models\utils.py
29 30 31 32 33 34 35 36 37 38 39 | |
mindocr.models.backbones.mindcv_models.utils.load_pretrained(model, default_cfg, num_classes=1000, in_channels=3, filter_fn=None, auto_mapping=False)
¶load pretrained model depending on cfgs of model
Source code in mindocr\models\backbones\mindcv_models\utils.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 | |
mindocr.models.backbones.mindcv_models.utils.make_divisible(v, divisor, min_value=None)
¶Find the smallest integer larger than v and divisible by divisor.
Source code in mindocr\models\backbones\mindcv_models\utils.py
116 117 118 119 120 121 122 123 124 125 126 127 128 | |
mindocr.models.backbones.mindcv_models.vgg
¶MindSpore implementation of VGGNet.
Refer to SqueezeNet: Very Deep Convolutional Networks for Large-Scale Image Recognition.
mindocr.models.backbones.mindcv_models.vgg.VGG
¶
Bases: nn.Cell
VGGNet model class, based on
"Very Deep Convolutional Networks for Large-Scale Image Recognition" <https://arxiv.org/abs/1409.1556>_
| PARAMETER | DESCRIPTION |
|---|---|
model_name |
name of the architecture. 'vgg11', 'vgg13', 'vgg16' or 'vgg19'.
TYPE:
|
batch_norm |
use batch normalization or not. Default: False.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
drop_rate |
dropout rate of the classifier. Default: 0.5.
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\vgg.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | |
mindocr.models.backbones.mindcv_models.vgg.vgg11(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 11 layers VGG model.
Refer to the base class models.VGG for more details.
Source code in mindocr\models\backbones\mindcv_models\vgg.py
137 138 139 140 141 142 143 144 145 146 147 148 | |
mindocr.models.backbones.mindcv_models.vgg.vgg13(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 13 layers VGG model.
Refer to the base class models.VGG for more details.
Source code in mindocr\models\backbones\mindcv_models\vgg.py
151 152 153 154 155 156 157 158 159 160 161 162 | |
mindocr.models.backbones.mindcv_models.vgg.vgg16(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 16 layers VGG model.
Refer to the base class models.VGG for more details.
Source code in mindocr\models\backbones\mindcv_models\vgg.py
165 166 167 168 169 170 171 172 173 174 175 176 | |
mindocr.models.backbones.mindcv_models.vgg.vgg19(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get 19 layers VGG model.
Refer to the base class models.VGG for more details.
Source code in mindocr\models\backbones\mindcv_models\vgg.py
179 180 181 182 183 184 185 186 187 188 189 190 | |
mindocr.models.backbones.mindcv_models.visformer
¶MindSpore implementation of Visformer.
Refer to: Visformer: The Vision-friendly Transformer
mindocr.models.backbones.mindcv_models.visformer.Attention
¶
Bases: nn.Cell
Attention layer
Source code in mindocr\models\backbones\mindcv_models\visformer.py
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | |
mindocr.models.backbones.mindcv_models.visformer.Block
¶
Bases: nn.Cell
visformer basic block
Source code in mindocr\models\backbones\mindcv_models\visformer.py
143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 | |
mindocr.models.backbones.mindcv_models.visformer.Mlp
¶
Bases: nn.Cell
MLP layer
Source code in mindocr\models\backbones\mindcv_models\visformer.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | |
mindocr.models.backbones.mindcv_models.visformer.Visformer
¶
Bases: nn.Cell
Visformer model class, based on '"Visformer: The Vision-friendly Transformer" https://arxiv.org/pdf/2104.12533.pdf'
| PARAMETER | DESCRIPTION |
|---|---|
image_size |
images input size. Default: 224.
TYPE:
|
number |
32.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
embed_dim |
embedding dimension in all head. Default: 384.
TYPE:
|
depth |
model block depth. Default: None.
TYPE:
|
num_heads |
number of heads. Default: None.
TYPE:
|
mlp_ratio |
ratio of hidden features in Mlp. Default: 4.
TYPE:
|
qkv_bias |
have bias in qkv layers or not. Default: False.
TYPE:
|
qk_scale |
Override default qk scale of head_dim ** -0.5 if set.
TYPE:
|
drop_rate |
dropout rate. Default: 0.
TYPE:
|
attn_drop_rate |
attention layers dropout rate. Default: 0.
TYPE:
|
drop_path_rate |
drop path rate. Default: 0.1.
TYPE:
|
attn_stage |
block will have a attention layer if value = '1' else not. Default: '1111'.
TYPE:
|
pos_embed |
position embedding. Default: True.
TYPE:
|
spatial_conv |
block will have a spatial convolution layer if value = '1' else not. Default: '1111'.
TYPE:
|
group |
convolution group. Default: 8.
TYPE:
|
pool |
if true will use global_pooling else not. Default: True.
TYPE:
|
conv_init |
if true will init convolution weights else not. Default: False.
DEFAULT:
|
Source code in mindocr\models\backbones\mindcv_models\visformer.py
209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 | |
mindocr.models.backbones.mindcv_models.visformer.visformer_small(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get visformer small model. Refer to the base class 'models.visformer' for more details.
Source code in mindocr\models\backbones\mindcv_models\visformer.py
467 468 469 470 471 472 473 474 475 476 477 478 | |
mindocr.models.backbones.mindcv_models.visformer.visformer_small_v2(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get visformer small2 model. Refer to the base class 'models.visformer' for more details.
Source code in mindocr\models\backbones\mindcv_models\visformer.py
481 482 483 484 485 486 487 488 489 490 491 492 | |
mindocr.models.backbones.mindcv_models.visformer.visformer_tiny(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get visformer tiny model. Refer to the base class 'models.visformer' for more details.
Source code in mindocr\models\backbones\mindcv_models\visformer.py
438 439 440 441 442 443 444 445 446 447 448 449 450 | |
mindocr.models.backbones.mindcv_models.visformer.visformer_tiny_v2(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get visformer tiny2 model. Refer to the base class 'models.visformer' for more details.
Source code in mindocr\models\backbones\mindcv_models\visformer.py
453 454 455 456 457 458 459 460 461 462 463 464 | |
mindocr.models.backbones.mindcv_models.vit
¶ViT
mindocr.models.backbones.mindcv_models.vit.Attention
¶
Bases: nn.Cell
Attention layer implementation, Rearrange Input -> B x N x hidden size.
| PARAMETER | DESCRIPTION |
|---|---|
dim |
The dimension of input features.
TYPE:
|
num_heads |
The number of attention heads. Default: 8.
TYPE:
|
keep_prob |
The keep rate, greater than 0 and less equal than 1. Default: 1.0.
TYPE:
|
attention_keep_prob |
The keep rate for attention. Default: 1.0.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
Tensor, output tensor. |
Examples:
>>> ops = Attention(768, 12)
Source code in mindocr\models\backbones\mindcv_models\vit.py
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | |
mindocr.models.backbones.mindcv_models.vit.Attention.construct(x)
¶Attention construct.
Source code in mindocr\models\backbones\mindcv_models\vit.py
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | |
mindocr.models.backbones.mindcv_models.vit.BaseClassifier
¶
Bases: nn.Cell
generate classifier to combine the backbone and head
Source code in mindocr\models\backbones\mindcv_models\vit.py
440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 | |
mindocr.models.backbones.mindcv_models.vit.DenseHead
¶
Bases: nn.Cell
LinearClsHead architecture.
| PARAMETER | DESCRIPTION |
|---|---|
input_channel |
The number of input channel.
TYPE:
|
num_classes |
Number of classes.
TYPE:
|
has_bias |
Specifies whether the layer uses a bias vector. Default: True.
TYPE:
|
activation |
activate function applied to the output. Eg.
TYPE:
|
keep_prob |
Dropout keeping rate, between [0, 1]. E.g. rate=0.9, means dropping out 10% of input. Default: 1.0.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
Tensor, output tensor. |
Source code in mindocr\models\backbones\mindcv_models\vit.py
355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 | |
mindocr.models.backbones.mindcv_models.vit.DropPath
¶
Bases: nn.Cell
Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
Source code in mindocr\models\backbones\mindcv_models\vit.py
249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 | |
mindocr.models.backbones.mindcv_models.vit.FeedForward
¶
Bases: nn.Cell
Feed Forward layer implementation.
| PARAMETER | DESCRIPTION |
|---|---|
in_features |
The dimension of input features.
TYPE:
|
hidden_features |
The dimension of hidden features. Default: None.
TYPE:
|
out_features |
The dimension of output features. Default: None
TYPE:
|
activation |
Activation function which will be stacked on top of the
TYPE:
|
normalization |
nn.GELU.
TYPE:
|
keep_prob |
The keep rate, greater than 0 and less equal than 1. Default: 1.0.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
Tensor, output tensor. |
Examples:
>>> ops = FeedForward(768, 3072)
Source code in mindocr\models\backbones\mindcv_models\vit.py
178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 | |
mindocr.models.backbones.mindcv_models.vit.FeedForward.construct(x)
¶Feed Forward construct.
Source code in mindocr\models\backbones\mindcv_models\vit.py
213 214 215 216 217 218 219 220 221 | |
mindocr.models.backbones.mindcv_models.vit.MultilayerDenseHead
¶
Bases: nn.Cell
MultilayerDenseHead architecture.
| PARAMETER | DESCRIPTION |
|---|---|
input_channel |
The number of input channel.
TYPE:
|
num_classes |
Number of classes.
TYPE:
|
mid_channel |
Number of channels in the hidden fc layers.
TYPE:
|
keep_prob |
Dropout keeping rate, between [0, 1]. E.g. rate=0.9, means dropping out 10% of
TYPE:
|
activation |
activate function applied to the output. Eg.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
Tensor, output tensor. |
Source code in mindocr\models\backbones\mindcv_models\vit.py
391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 | |
mindocr.models.backbones.mindcv_models.vit.PatchEmbedding
¶
Bases: nn.Cell
Path embedding layer for ViT. First rearrange b c (h p) (w p) -> b (h w) (p p c).
| PARAMETER | DESCRIPTION |
|---|---|
image_size |
Input image size. Default: 224.
TYPE:
|
patch_size |
Patch size of image. Default: 16.
TYPE:
|
embed_dim |
The dimension of embedding. Default: 768.
TYPE:
|
input_channels |
The number of input channel. Default: 3.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
Tensor, output tensor. |
Examples:
>>> ops = PathEmbedding(224, 16, 768, 3)
Source code in mindocr\models\backbones\mindcv_models\vit.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | |
mindocr.models.backbones.mindcv_models.vit.PatchEmbedding.construct(x)
¶Path Embedding construct.
Source code in mindocr\models\backbones\mindcv_models\vit.py
104 105 106 107 108 109 110 111 | |
mindocr.models.backbones.mindcv_models.vit.ResidualCell
¶
Bases: nn.Cell
Cell which implements Residual function:
| PARAMETER | DESCRIPTION |
|---|---|
cell |
Cell needed to add residual block.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
Tensor, output tensor. |
Examples:
>>> ops = ResidualCell(nn.Dense(3,4))
Source code in mindocr\models\backbones\mindcv_models\vit.py
224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 | |
mindocr.models.backbones.mindcv_models.vit.ResidualCell.construct(x)
¶ResidualCell construct.
Source code in mindocr\models\backbones\mindcv_models\vit.py
244 245 246 | |
mindocr.models.backbones.mindcv_models.vit.TransformerEncoder
¶
Bases: nn.Cell
TransformerEncoder implementation.
| PARAMETER | DESCRIPTION |
|---|---|
dim |
The dimension of embedding.
TYPE:
|
num_layers |
The depth of transformer.
TYPE:
|
num_heads |
The number of attention heads.
TYPE:
|
mlp_dim |
The dimension of MLP hidden layer.
TYPE:
|
keep_prob |
The keep rate, greater than 0 and less equal than 1. Default: 1.0.
TYPE:
|
attention_keep_prob |
The keep rate for attention. Default: 1.0.
TYPE:
|
drop_path_keep_prob |
The keep rate for drop path. Default: 1.0.
TYPE:
|
activation |
Activation function which will be stacked on top of the
TYPE:
|
normalization |
nn.GELU.
TYPE:
|
norm |
Norm layer that will be stacked on top of the convolution
TYPE:
|
layer. |
nn.LayerNorm.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
Tensor, output tensor. |
Examples:
>>> ops = TransformerEncoder(768, 12, 12, 3072)
Source code in mindocr\models\backbones\mindcv_models\vit.py
274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 | |
mindocr.models.backbones.mindcv_models.vit.TransformerEncoder.construct(x)
¶Transformer construct.
Source code in mindocr\models\backbones\mindcv_models\vit.py
350 351 352 | |
mindocr.models.backbones.mindcv_models.vit.ViT
¶
Bases: nn.Cell
Vision Transformer architecture implementation.
| PARAMETER | DESCRIPTION |
|---|---|
image_size |
Input image size. Default: 224.
TYPE:
|
input_channels |
The number of input channel. Default: 3.
TYPE:
|
patch_size |
Patch size of image. Default: 16.
TYPE:
|
embed_dim |
The dimension of embedding. Default: 768.
TYPE:
|
num_layers |
The depth of transformer. Default: 12.
TYPE:
|
num_heads |
The number of attention heads. Default: 12.
TYPE:
|
mlp_dim |
The dimension of MLP hidden layer. Default: 3072.
TYPE:
|
keep_prob |
The keep rate, greater than 0 and less equal than 1. Default: 1.0.
TYPE:
|
attention_keep_prob |
The keep rate for attention layer. Default: 1.0.
TYPE:
|
drop_path_keep_prob |
The keep rate for drop path. Default: 1.0.
TYPE:
|
activation |
Activation function which will be stacked on top of the normalization layer (if not None), otherwise on top of the conv layer. Default: nn.GELU.
TYPE:
|
norm |
Norm layer that will be stacked on top of the convolution layer. Default: nn.LayerNorm.
TYPE:
|
pool |
The method of pooling. Default: 'cls'.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, 768)
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If |
Supported Platforms
GPU
Examples:
>>> net = ViT()
>>> x = ms.Tensor(np.ones([1, 3, 224, 224]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 768)
About ViT:
Vision Transformer (ViT) shows that a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.
Citation:
.. code-block::
@article{2020An,
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
author={Dosovitskiy, A. and Beyer, L. and Kolesnikov, A. and Weissenborn, D. and Houlsby, N.},
year={2020},
}
Source code in mindocr\models\backbones\mindcv_models\vit.py
481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 | |
mindocr.models.backbones.mindcv_models.vit.ViT.construct(x)
¶ViT construct.
Source code in mindocr\models\backbones\mindcv_models\vit.py
602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 | |
mindocr.models.backbones.mindcv_models.vit.vit(image_size, input_channels, patch_size, embed_dim, num_layers, num_heads, num_classes, mlp_dim, dropout=0.0, attention_dropout=0.0, drop_path_rate=0.0, activation=nn.GELU, norm=nn.LayerNorm, pool='cls', representation_size=None, pretrained=False, url_cfg=None)
¶Vision Transformer architecture.
Source code in mindocr\models\backbones\mindcv_models\vit.py
623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 | |
mindocr.models.backbones.mindcv_models.vit.vit_b_16_224(pretrained=False, num_classes=1000, in_channels=3, image_size=224, has_logits=False, drop_rate=0.0, drop_path_rate=0.0)
¶Constructs a vit_b_16 architecture from
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
Whether to download and load the pre-trained model. Default: False.
TYPE:
|
num_classes |
The number of classification. Default: 1000.
TYPE:
|
in_channels |
The number of input channels. Default: 3.
TYPE:
|
image_size |
The input image size. Default: 224 for ImageNet.
TYPE:
|
has_logits |
Whether has logits or not. Default: False.
TYPE:
|
drop_rate |
The drop out rate. Default: 0.0.s
TYPE:
|
drop_path_rate |
The stochastic depth rate. Default: 0.0.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ViT
|
ViT network, MindSpore.nn.Cell |
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Examples:
>>> net = vit_b_16_224()
>>> x = ms.Tensor(np.ones([1, 3, 224, 224]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 1000)
Outputs
Tensor of shape :math:(N, CLASSES_{out})
Supported Platforms
GPU
Source code in mindocr\models\backbones\mindcv_models\vit.py
678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 | |
mindocr.models.backbones.mindcv_models.vit.vit_b_16_384(pretrained=False, num_classes=1000, in_channels=3, image_size=384, has_logits=False, drop_rate=0.0, drop_path_rate=0.0)
¶construct and return a ViT network
Source code in mindocr\models\backbones\mindcv_models\vit.py
742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 | |
mindocr.models.backbones.mindcv_models.vit.vit_b_32_224(pretrained=False, num_classes=1000, in_channels=3, image_size=224, has_logits=False, drop_rate=0.0, drop_path_rate=0.0)
¶construct and return a ViT network
Source code in mindocr\models\backbones\mindcv_models\vit.py
843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 | |
mindocr.models.backbones.mindcv_models.vit.vit_b_32_384(pretrained=False, num_classes=1000, in_channels=3, image_size=384, has_logits=False, drop_rate=0.0, drop_path_rate=0.0)
¶construct and return a ViT network
Source code in mindocr\models\backbones\mindcv_models\vit.py
876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 | |
mindocr.models.backbones.mindcv_models.vit.vit_l_16_224(pretrained=False, num_classes=1000, in_channels=3, image_size=224, has_logits=False, drop_rate=0.0, drop_path_rate=0.0)
¶construct and return a ViT network
Source code in mindocr\models\backbones\mindcv_models\vit.py
775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 | |
mindocr.models.backbones.mindcv_models.vit.vit_l_16_384(pretrained=False, num_classes=1000, in_channels=3, image_size=384, has_logits=False, drop_rate=0.0, drop_path_rate=0.0)
¶construct and return a ViT network
Source code in mindocr\models\backbones\mindcv_models\vit.py
809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 | |
mindocr.models.backbones.mindcv_models.vit.vit_l_32_224(pretrained=False, num_classes=1000, in_channels=3, image_size=224, has_logits=False, drop_rate=0.0, drop_path_rate=0.0)
¶construct and return a ViT network
Source code in mindocr\models\backbones\mindcv_models\vit.py
909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 | |
mindocr.models.backbones.mindcv_models.xcit
¶MindSpore implementation of XCiT Refer to: XCiT: Cross-Covariance Image Transformers
mindocr.models.backbones.mindcv_models.xcit.ClassAttention
¶
Bases: nn.Cell
Class Attention Layer as in CaiT https://arxiv.org/abs/2103.17239
Source code in mindocr\models\backbones\mindcv_models\xcit.py
184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 | |
mindocr.models.backbones.mindcv_models.xcit.ClassAttentionBlock
¶
Bases: nn.Cell
Class Attention Layer as in CaiT https://arxiv.org/abs/2103.17239
Source code in mindocr\models\backbones\mindcv_models\xcit.py
224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 | |
mindocr.models.backbones.mindcv_models.xcit.ConvPatchEmbed
¶
Bases: nn.Cell
Image to Patch Embedding using multiple convolutional layers
Source code in mindocr\models\backbones\mindcv_models\xcit.py
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | |
mindocr.models.backbones.mindcv_models.xcit.LPI
¶
Bases: nn.Cell
Local Patch Interaction module that allows explicit communication between tokens in 3x3 windows to augment the implicit communcation performed by the block diagonal scatter attention. Implemented using 2 layers of separable 3x3 convolutions with GeLU and BatchNorm2d
Source code in mindocr\models\backbones\mindcv_models\xcit.py
151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 | |
mindocr.models.backbones.mindcv_models.xcit.PositionalEncodingFourier
¶
Bases: nn.Cell
Positional encoding relying on a fourier kernel matching the one used in the "Attention is all of Need" paper. The implementation builds on DeTR code https://github.com/facebookresearch/detr/blob/master/models/position_encoding.py
Source code in mindocr\models\backbones\mindcv_models\xcit.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | |
mindocr.models.backbones.mindcv_models.xcit.XCA
¶
Bases: nn.Cell
Cross-Covariance Attention (XCA) operation where the channels are updated using a weighted sum. The weights are obtained from the (softmax normalized) Cross-covariance matrix (Q^T K \in d_h \times d_h)
Source code in mindocr\models\backbones\mindcv_models\xcit.py
273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 | |
mindocr.models.backbones.mindcv_models.xcit.XCiT
¶
Bases: nn.Cell
XCiT model class, based on
"XCiT: Cross-Covariance Image Transformers" <https://arxiv.org/abs/2106.09681>_
| PARAMETER | DESCRIPTION |
|---|---|
img_size |
input image size
TYPE:
|
patch_size |
patch size
TYPE:
|
in_chans |
number of input channels
TYPE:
|
num_classes |
number of classes for classification head
TYPE:
|
embed_dim |
embedding dimension
TYPE:
|
depth |
depth of transformer
TYPE:
|
num_heads |
number of attention heads
TYPE:
|
mlp_ratio |
ratio of mlp hidden dim to embedding dim
TYPE:
|
qkv_bias |
enable bias for qkv if True
TYPE:
|
qk_scale |
override default qk scale of head_dim ** -0.5 if set
TYPE:
|
drop_rate |
dropout rate
TYPE:
|
attn_drop_rate |
attention dropout rate
TYPE:
|
drop_path_rate |
stochastic depth rate
TYPE:
|
norm_layer |
(nn.Module): normalization layer
TYPE:
|
cls_attn_layers |
(int) Depth of Class attention layers
TYPE:
|
use_pos |
(bool) whether to use positional encoding
TYPE:
|
eta |
(float) layerscale initialization value
TYPE:
|
tokens_norm |
(bool) Whether to normalize all tokens or just the cls_token in the CA
TYPE:
|
Source code in mindocr\models\backbones\mindcv_models\xcit.py
353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 | |
mindocr.models.backbones.mindcv_models.xcit.conv3x3(in_planes, out_planes, stride=1)
¶3x3 convolution with padding
Source code in mindocr\models\backbones\mindcv_models\xcit.py
91 92 93 94 95 96 97 98 | |
mindocr.models.backbones.mindcv_models.xcit.xcit_tiny_12_p16(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶Get xcit_tiny_12_p16 model. Refer to the base class 'models.XCiT' for more details.
Source code in mindocr\models\backbones\mindcv_models\xcit.py
477 478 479 480 481 482 483 484 485 486 487 488 489 490 | |
mindocr.models.backbones.mindcv_wrapper
¶
mindocr.models.backbones.mindcv_wrapper.MindCVBackboneWrapper
¶
Bases: nn.Cell
It reuses the forward_features interface in mindcv models. Please check where the features are extracted.
Note: text recognition models like CRNN expects output feature in shape [bs, c, h, w]. but some models in mindcv like ViT output features in shape [bs, c]. please check and pick accordingly.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
Whether the model backbone is pretrained. Default; True
TYPE:
|
checkpoint_path |
The path of checkpoint files. Default: "".
TYPE:
|
features_only |
Output the features at different strides instead. Default: False
TYPE:
|
out_indices |
The indicies of the output features when
TYPE:
|
Example
network = MindCVBackboneWrapper('resnet50', pretrained=True)
Source code in mindocr\models\backbones\mindcv_wrapper.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | |
mindocr.models.backbones.rec_vgg
¶
mindocr.models.backbones.rec_vgg.RecVGG
¶
Bases: nn.Cell
VGG Network structure
Source code in mindocr\models\backbones\rec_vgg.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | |
mindocr.models.base_model
¶
mindocr.models.base_model.BaseModel
¶
Bases: nn.Cell
Source code in mindocr\models\base_model.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | |
mindocr.models.base_model.BaseModel.__init__(config)
¶| PARAMETER | DESCRIPTION |
|---|---|
config |
model config
TYPE:
|
Inputs
x (Tensor): The input tensor feeding into the backbone, neck and head sequentially. y (Tensor): The extra input tensor. If it is provided, it will feed into the head. Default: None
Source code in mindocr\models\base_model.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | |
mindocr.models.builder
¶
build models
mindocr.models.builder.build_model(name_or_config, **kwargs)
¶
There are two ways to build a model. 1. load a predefined model according the given model name. 2. build the model according to the detailed configuration of the each module (transform, backbone, neck and head), for lower-level architecture customization.
| PARAMETER | DESCRIPTION |
|---|---|
name_or_config |
model name or config if it's a string, it should be a model name (which can be found by mindocr.list_models()) if it's a dict, it should be an architecture configuration defining the backbone/neck/head components (e.g., parsed from yaml config).
TYPE:
|
kwargs |
options if name_or_config is a model name, supported args in kwargs are: - pretrained (bool): if True, pretrained checkpoint will be downloaded and loaded into the network. - ckpt_load_path (str): path to checkpoint file. if a non-empty string given, the local checkpoint will loaded into the network. if name_or_config is an architecture definition dict, supported args are: - ckpt_load_path (str): path to checkpoint file.
TYPE:
|
Return
nn.Cell
from mindocr.models import build_model net = build_model(cfg['model']) net = build_model(cfg['model'], ckpt_load_path='./r50_fpn_dbhead.ckpt') # build network and load checkpoint net = build_model('dbnet_resnet50', pretrained=True)
Source code in mindocr\models\builder.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | |
mindocr.models.cls_mv3
¶
mindocr.models.det_dbnet
¶
mindocr.models.det_east
¶
mindocr.models.det_psenet
¶
mindocr.models.heads
¶
mindocr.models.heads.build_head(head_name, **kwargs)
¶
Build Head network.
| PARAMETER | DESCRIPTION |
|---|---|
head_name |
the head layer(s) name, which shoule be one of the supported_heads.
TYPE:
|
kwargs |
input args for the head network
TYPE:
|
Return
nn.Cell for head module
Construct
Example
build CTCHead¶
from mindocr.models.heads import build_head config = dict(head_name='CTCHead', in_channels=256, out_channels=37) head = build_head(**config) print(head)
Source code in mindocr\models\heads\builder.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | |
mindocr.models.heads.builder
¶
mindocr.models.heads.builder.build_head(head_name, **kwargs)
¶Build Head network.
| PARAMETER | DESCRIPTION |
|---|---|
head_name |
the head layer(s) name, which shoule be one of the supported_heads.
TYPE:
|
kwargs |
input args for the head network
TYPE:
|
Return
nn.Cell for head module
Construct
Example
build CTCHead¶
from mindocr.models.heads import build_head config = dict(head_name='CTCHead', in_channels=256, out_channels=37) head = build_head(**config) print(head)
Source code in mindocr\models\heads\builder.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | |
mindocr.models.heads.cls_mv3_head
¶
mindocr.models.heads.cls_mv3_head.ClsHead
¶
Bases: nn.Cell
Text direction classification head.
Source code in mindocr\models\heads\cls_mv3_head.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | |
mindocr.models.heads.det_db_head
¶
mindocr.models.heads.det_db_head.DBHead
¶
Bases: nn.Cell
Source code in mindocr\models\heads\det_db_head.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | |
mindocr.models.heads.det_db_head.DBHead.construct(features)
¶| PARAMETER | DESCRIPTION |
|---|---|
features |
encoded features
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Union[ms.Tensor, Tuple[ms.Tensor, ...]]
|
Union( |
binary
|
predicted binary map
TYPE:
|
thresh
|
predicted threshold map (only return if adaptive is True in training)
TYPE:
|
thresh_binary
|
differentiable binary map (only if adaptive is True in training)
TYPE:
|
Source code in mindocr\models\heads\det_db_head.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | |
mindocr.models.heads.rec_attn_head
¶
mindocr.models.heads.rec_attn_head.AttentionHead
¶
Bases: nn.Cell
Source code in mindocr\models\heads\rec_attn_head.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | |
mindocr.models.heads.rec_attn_head.AttentionHead.__init__(in_channels, out_channels, hidden_size=256, batch_max_length=25)
¶Inputs
Source code in mindocr\models\heads\rec_attn_head.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | |
mindocr.models.heads.rec_ctc_head
¶
mindocr.models.heads.rec_ctc_head.CTCHead
¶
Bases: nn.Cell
An MLP module for CTC Loss. For CRNN, the input should be in shape [W, BS, 2*C], which is output by RNNEncoder. The MLP encodes and classifies the features, then return a logit tensor in shape [W, BS, num_classes] For chinese words, num_classes can be over 60,000, so weight regulaization may matter.
Source code in mindocr\models\heads\rec_ctc_head.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | |
mindocr.models.heads.rec_ctc_head.CTCHead.construct(x)
¶Feed Forward construct.
| PARAMETER | DESCRIPTION |
|---|---|
x |
feature in shape [W, BS, 2*C]
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
h
|
if training, h is logits in shape [W, BS, num_classes], where W - sequence len, fixed. (dim order required by ms.ctcloss) if not training, h is class probabilites in shape [BS, W, num_classes].
TYPE:
|
Source code in mindocr\models\heads\rec_ctc_head.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | |
mindocr.models.necks
¶
mindocr.models.necks.build_neck(neck_name, **kwargs)
¶
Build Neck network.
| PARAMETER | DESCRIPTION |
|---|---|
neck_name |
the neck name, which shoule be one of the supported_necks.
TYPE:
|
kwargs |
input args for the neck network
TYPE:
|
Return
nn.Cell for neck module
Construct
Example
build RNNEncoder¶
from mindocr.models.necks import build_neck config = dict(neck_name='RNNEncoder', in_channels=128, hidden_size=256) neck = build_neck(**config) print(neck)
Source code in mindocr\models\necks\builder.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | |
mindocr.models.necks.asf
¶
mindocr.models.necks.asf.AdaptiveScaleFusion
¶
Bases: nn.Cell
Adaptive Scale Fusion module from the DBNet++ <https://arxiv.org/abs/2202.10304>__ paper.
| PARAMETER | DESCRIPTION |
|---|---|
channels |
number of input to and output channels from ASF
|
channel_attention |
use channel attention
DEFAULT:
|
Source code in mindocr\models\necks\asf.py
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | |
mindocr.models.necks.builder
¶
mindocr.models.necks.builder.build_neck(neck_name, **kwargs)
¶Build Neck network.
| PARAMETER | DESCRIPTION |
|---|---|
neck_name |
the neck name, which shoule be one of the supported_necks.
TYPE:
|
kwargs |
input args for the neck network
TYPE:
|
Return
nn.Cell for neck module
Construct
Example
build RNNEncoder¶
from mindocr.models.necks import build_neck config = dict(neck_name='RNNEncoder', in_channels=128, hidden_size=256) neck = build_neck(**config) print(neck)
Source code in mindocr\models\necks\builder.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | |
mindocr.models.necks.fpn
¶
mindocr.models.necks.fpn.DBFPN
¶
Bases: nn.Cell
Source code in mindocr\models\necks\fpn.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | |
mindocr.models.necks.fpn.DBFPN.__init__(in_channels, out_channels=256, weight_init='HeUniform', bias=False, use_asf=False, channel_attention=True)
¶resnet18=[64, 128, 256, 512]
resnet50=[2048,1024,512,256]
bias: Whether conv layers have bias or not. use_asf: use ASF module for multi-scale feature aggregation (DBNet++ only) channel_attention: use channel attention in ASF module
Source code in mindocr\models\necks\fpn.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | |
mindocr.models.necks.img2seq
¶
mindocr.models.necks.img2seq.Img2Seq
¶
Bases: nn.Cell
Source code in mindocr\models\necks\img2seq.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | |
mindocr.models.necks.rnn
¶
mindocr.models.necks.rnn.RNNEncoder
¶
Bases: nn.Cell
CRNN sequence encoder which contains reshape and bidirectional LSTM layers. Receive visual features [N, C, 1, W] Reshape features to shape [W, N, C] Use Bi-LSTM to encode into new features in shape [W, N, 2*C]. where W - seq len, N - batch size, C - feature len
| PARAMETER | DESCRIPTION |
|---|---|
input_channels |
C, number of input channels, corresponding to feature length
TYPE:
|
hidden_size(int) |
the hidden size in LSTM layers, default is 512
|
Source code in mindocr\models\necks\rnn.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | |
mindocr.models.necks.rnn.RNNEncoder.construct(features)
¶| PARAMETER | DESCRIPTION |
|---|---|
x |
feature, a Tensor of shape :math:
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Encoded features . Shape :math: |
Source code in mindocr\models\necks\rnn.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | |
mindocr.models.necks.select
¶
mindocr.models.necks.select.Select
¶
Bases: nn.Cell
select feature from the backbone output.
Source code in mindocr\models\necks\select.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 | |
mindocr.models.rec_crnn
¶
mindocr.models.rec_rare
¶
mindocr.models.rec_svtr
¶
mindocr.models.utils
¶
mindocr.models.utils.load_model
¶
mindocr.models.utils.load_model.load_model(network, load_from=None, filter_fn=None, auto_mapping=False, strict=False)
¶Load the checkpoint into the model
| PARAMETER | DESCRIPTION |
|---|---|
network |
network
|
load_from |
a string that can be url or local path to a checkpoint, that will be loaded to the network.
TYPE:
|
filter_fn |
a function filtering the parameters that will be loading into the network. If it is None, all parameters will be loaded.
TYPE:
|
auto_mapping |
when it is True, then load the paramters even if the names are slightly different
TYPE:
|
strict |
If it is true, then the shape and the type of the parameters in the checkpoint and the network should be consistent raise exception if they do not match.
TYPE:
|
Source code in mindocr\models\utils\load_model.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | |
mindocr.models.utils.rnn_cells
¶
RNN Cells that supports FP16 inputs
mindocr.models.utils.rnn_cells.GRUCell
¶
Bases: RNNCellBase
A GRU(Gated Recurrent Unit) cell.
.. math::
\begin{array}{ll}
r = \sigma(W_{ir} x + b_{ir} + W_{hr} h + b_{hr}) \\
z = \sigma(W_{iz} x + b_{iz} + W_{hz} h + b_{hz}) \\
n = \tanh(W_{in} x + b_{in} + r * (W_{hn} h + b_{hn})) \\
h' = (1 - z) * n + z * h
\end{array}
Here :math:\sigma is the sigmoid function, and :math:* is the Hadamard product. :math:W, b
are learnable weights between the output and the input in the formula. For instance,
:math:W_{ir}, b_{ir} are the weight and bias used to transform from input :math:x to :math:r.
Details can be found in paper
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
<https://aclanthology.org/D14-1179.pdf>_.
The LSTMCell can be simplified in NN layer, the following formula:
.. math:: h{'},c = LSTMCell(x, (h_0, c_0))
| PARAMETER | DESCRIPTION |
|---|---|
input_size |
Number of features of input.
TYPE:
|
hidden_size |
Number of features of hidden layer.
TYPE:
|
has_bias |
Whether the cell has bias
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape (batch_size,
input_size). - hx (Tensor) - Tensor of data type mindspore.float32 and shape (batch_size,
hidden_size). Data type ofhxmust be the same asx.
Outputs
- hx' (Tensor) - Tensor of shape (batch_size,
hidden_size).
| RAISES | DESCRIPTION |
|---|---|
TypeError
|
If |
TypeError
|
If |
Supported Platforms
Ascend GPU CPU
Examples:
>>> net = nn.GRUCell(10, 16)
>>> x = Tensor(np.ones([5, 3, 10]).astype(np.float32))
>>> hx = Tensor(np.ones([3, 16]).astype(np.float32))
>>> output = []
>>> for i in range(5):
... hx = net(x[i], hx)
... output.append(hx)
>>> print(output[0].shape)
(3, 16)
Source code in mindocr\models\utils\rnn_cells.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | |
mindocr.optim
¶
optim init
mindocr.optim.adamw
¶
Gradient clipping wrapper for optimizers.
mindocr.optim.adamw.AdamW
¶
Bases: Optimizer
Implements the gradient clipping by norm for a AdamWeightDecay optimizer.
Source code in mindocr\optim\adamw.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 | |
mindocr.optim.adamw.tensor_grad_scale(scale, grad)
¶
Get grad with scale.
Source code in mindocr\optim\adamw.py
34 35 36 37 38 39 | |
mindocr.optim.adamw.tensor_grad_scale_with_tensor(scale, grad)
¶
Get grad with scale.
Source code in mindocr\optim\adamw.py
42 43 44 45 | |
mindocr.optim.adan
¶
adan
mindocr.optim.adan.Adan
¶
Bases: Optimizer
The Adan (ADAptive Nesterov momentum algorithm) Optimizer from https://arxiv.org/abs/2208.06677
Note: it is an experimental version.
Source code in mindocr\optim\adan.py
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 | |
mindocr.optim.adan.Adan.target(value)
¶If the input value is set to "CPU", the parameters will be updated on the host using the Fused optimizer operation.
Source code in mindocr\optim\adan.py
174 175 176 177 178 179 180 | |
mindocr.optim.lion
¶
mindocr.optim.lion.Lion
¶
Bases: Optimizer
Implementation of Lion optimizer from paper 'https://arxiv.org/abs/2302.06675'. Additionally, this implementation is with gradient clipping.
Notes: lr is usually 3-10x smaller than adamw. weight decay is usually 3-10x larger than adamw.
Source code in mindocr\optim\lion.py
118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 | |
mindocr.optim.lion.tensor_grad_scale(scale, grad)
¶
Get grad with scale.
Source code in mindocr\optim\lion.py
28 29 30 31 32 33 | |
mindocr.optim.lion.tensor_grad_scale_with_tensor(scale, grad)
¶
Get grad with scale.
Source code in mindocr\optim\lion.py
36 37 38 39 | |
mindocr.optim.nadam
¶
nadam
mindocr.optim.nadam.NAdam
¶
Bases: Optimizer
Implements NAdam algorithm (a variant of Adam based on Nesterov momentum).
Source code in mindocr\optim\nadam.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | |
mindocr.optim.optim_factory
¶
optim factory
mindocr.optim.optim_factory.create_optimizer(params, opt='adam', lr=0.001, weight_decay=0, momentum=0.9, nesterov=False, filter_bias_and_bn=True, loss_scale=1.0, schedule_decay=0.004, checkpoint_path='', eps=1e-10, **kwargs)
¶
Creates optimizer by name.
| PARAMETER | DESCRIPTION |
|---|---|
params |
network parameters. Union[list[Parameter],list[dict]], which must be the list of parameters or list of dicts. When the list element is a dictionary, the key of the dictionary can be "params", "lr", "weight_decay","grad_centralization" and "order_params".
|
opt |
wrapped optimizer. You could choose like 'sgd', 'nesterov', 'momentum', 'adam', 'adamw', 'lion', 'rmsprop', 'adagrad', 'lamb'. 'adam' is the default choose for convolution-based networks. 'adamw' is recommended for ViT-based networks. Default: 'adam'.
TYPE:
|
lr |
learning rate: float or lr scheduler. Fixed and dynamic learning rate are supported. Default: 1e-3.
TYPE:
|
weight_decay |
weight decay factor. It should be noted that weight decay can be a constant value or a Cell. It is a Cell only when dynamic weight decay is applied. Dynamic weight decay is similar to dynamic learning rate, users need to customize a weight decay schedule only with global step as input, and during training, the optimizer calls the instance of WeightDecaySchedule to get the weight decay value of current step. Default: 0.
TYPE:
|
momentum |
momentum if the optimizer supports. Default: 0.9.
TYPE:
|
nesterov |
Whether to use Nesterov Accelerated Gradient (NAG) algorithm to update the gradients. Default: False.
TYPE:
|
filter_bias_and_bn |
whether to filter batch norm parameters and bias from weight decay. If True, weight decay will not apply on BN parameters and bias in Conv or Dense layers. Default: True.
TYPE:
|
loss_scale |
A floating point value for the loss scale, which must be larger than 0.0. Default: 1.0.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
Optimizer object |
Source code in mindocr\optim\optim_factory.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 | |
mindocr.optim.param_grouping
¶
group parameters for setting different weight decay or learning rate for different layers in the network.
mindocr.optim.param_grouping.create_group_params(params, weight_decay=0, grouping_strategy=None, no_weight_decay_params=[], **kwargs)
¶
create group parameters for setting different weight decay or learning rate for different layers in the network.
| PARAMETER | DESCRIPTION |
|---|---|
params |
network params
|
weight_decay |
weight decay value
TYPE:
|
grouping_strategy |
name of the hard-coded grouping strategy. If not None, group parameters according to
the predefined function and
TYPE:
|
no_weight_decay_params |
list of the param name substrings that will be picked to exclude from
weight decay. If a parameter containing one of the substrings in the list, the paramter will not be applied
with weigt decay. (Tips: param names can be checked by
TYPE:
|
Return
list[dict], grouped parameters
Source code in mindocr\optim\param_grouping.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | |
mindocr.postprocess
¶
mindocr.postprocess.build_postprocess(config)
¶
Create postprocess function.
| PARAMETER | DESCRIPTION |
|---|---|
config |
configuration for postprocess including postprocess
TYPE:
|
Return
Object
Example
Create postprocess function¶
from mindocr.postprocess import build_postprocess config = dict(name="RecCTCLabelDecode", use_space_char=False) postprocess = build_postprocess(config) postprocess
Source code in mindocr\postprocess\builder.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | |
mindocr.postprocess.builder
¶
mindocr.postprocess.builder.build_postprocess(config)
¶
Create postprocess function.
| PARAMETER | DESCRIPTION |
|---|---|
config |
configuration for postprocess including postprocess
TYPE:
|
Return
Object
Example
Create postprocess function¶
from mindocr.postprocess import build_postprocess config = dict(name="RecCTCLabelDecode", use_space_char=False) postprocess = build_postprocess(config) postprocess
Source code in mindocr\postprocess\builder.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | |
mindocr.postprocess.cls_postprocess
¶
mindocr.postprocess.cls_postprocess.ClsPostprocess
¶
Bases: object
Map the predicted index back to orignal format (angle).
Source code in mindocr\postprocess\cls_postprocess.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | |
mindocr.postprocess.det_base_postprocess
¶
mindocr.postprocess.det_base_postprocess.DetBasePostprocess
¶
Base class for all text detection postprocessings.
| PARAMETER | DESCRIPTION |
|---|---|
box_type |
text region representation type after postprocessing, options: ['quad', 'poly']
TYPE:
|
rescale_fields |
names of fields to rescale back to the shape of the original image.
TYPE:
|
Source code in mindocr\postprocess\det_base_postprocess.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | |
mindocr.postprocess.det_base_postprocess.DetBasePostprocess.__call__(pred, shape_list=None, **kwargs)
¶Execution entry for postprocessing, which postprocess network prediction on the transformed image space to get text boxes and then rescale them back to the original image space.
| PARAMETER | DESCRIPTION |
|---|---|
pred |
network prediction for input batch data, shape [batch_size, ...]
TYPE:
|
shape_list |
shape and scale info for each image in the batch, shape [batch_size, 4]. Each internal array is [src_h, src_w, scale_h, scale_w], where src_h and src_w are height and width of the original image, and scale_h and scale_w are their scale ratio during image resizing.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
detection result as a dict with keys:
- polys (List[List[np.ndarray]): predicted polygons mapped on the original image space,
shape [batch_size, num_polygons, num_points, 2]. If |
Source code in mindocr\postprocess\det_base_postprocess.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 | |
mindocr.postprocess.det_base_postprocess.DetBasePostprocess.rescale(result, shape_list)
¶rescale result back to original image shape
| PARAMETER | DESCRIPTION |
|---|---|
shape_list |
image shape and scale info, shape [batch_size, 4]
TYPE:
|
Return
rescaled result specified by rescale_field
Source code in mindocr\postprocess\det_base_postprocess.py
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | |
mindocr.postprocess.det_db_postprocess
¶
mindocr.postprocess.det_db_postprocess.DBPostprocess
¶
Bases: DetBasePostprocess
DBNet & DBNet++ postprocessing pipeline: extracts polygons / rectangles from a binary map (heatmap) and returns their coordinates.
| PARAMETER | DESCRIPTION |
|---|---|
binary_thresh |
binarization threshold applied to the heatmap output of DBNet.
TYPE:
|
box_thresh |
polygon confidence threshold. Polygons with scores lower than this threshold are filtered out.
TYPE:
|
max_candidates |
maximum number of proposed polygons.
TYPE:
|
expand_ratio |
controls by how much polygons need to be expanded to recover the original text shape (DBNet predicts shrunken text masks).
TYPE:
|
box_type |
output polygons ('polys') or rectangles ('quad') as the network's predictions.
DEFAULT:
|
pred_name |
heatmap's name used for polygons extraction.
TYPE:
|
rescale_fields |
name of fields to scale back to the shape of the original image.
TYPE:
|
Source code in mindocr\postprocess\det_db_postprocess.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 | |
mindocr.postprocess.det_east_postprocess
¶
mindocr.postprocess.det_east_postprocess.EASTPostprocess
¶
Bases: DetBasePostprocess
Source code in mindocr\postprocess\det_east_postprocess.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | |
mindocr.postprocess.det_pse_postprocess
¶
mindocr.postprocess.det_pse_postprocess.PSEPostprocess
¶
Bases: DetBasePostprocess
Source code in mindocr\postprocess\det_pse_postprocess.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | |
mindocr.postprocess.rec_postprocess
¶
mindocr.postprocess.rec_postprocess.RecAttnLabelDecode
¶
Source code in mindocr\postprocess\rec_postprocess.py
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 | |
mindocr.postprocess.rec_postprocess.RecAttnLabelDecode.__call__(preds, labels=None, **kwargs)
¶| PARAMETER | DESCRIPTION |
|---|---|
preds |
containing prediction tensor in shape [BS, W, num_classes]
TYPE:
|
Return
texts (List[Tuple]): list of string
Source code in mindocr\postprocess\rec_postprocess.py
255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 | |
mindocr.postprocess.rec_postprocess.RecAttnLabelDecode.__init__(character_dict_path=None, use_space_char=False, lower=False)
¶Convert text label (str) to a sequence of character indices according to the char dictionary
| PARAMETER | DESCRIPTION |
|---|---|
character_dict_path |
path to dictionary, if None, a dictionary containing 36 chars (i.e., "0123456789abcdefghijklmnopqrstuvwxyz") will be used.
TYPE:
|
use_space_char(bool) |
if True, add space char to the dict to recognize the space in between two words
|
lower |
if True, all upper-case chars in the label text will be converted to lower case. Set to be True if dictionary only contains lower-case chars. Set to be False if not and want to recognition both upper-case and lower-case.
TYPE:
|
| ATTRIBUTE | DESCRIPTION |
|---|---|
go_idx |
the index of the GO token
|
stop_idx |
the index of the STOP token
|
num_valid_chars |
the number of valid characters (including space char if used) in the dictionary
|
num_classes |
the number of classes (which valid characters char and the speical token for blank padding). so num_classes = num_valid_chars + 1
|
Source code in mindocr\postprocess\rec_postprocess.py
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 | |
mindocr.postprocess.rec_postprocess.RecCTCLabelDecode
¶
Bases: object
Convert text label (str) to a sequence of character indices according to the char dictionary
| PARAMETER | DESCRIPTION |
|---|---|
character_dict_path |
path to dictionary, if None, a dictionary containing 36 chars (i.e., "0123456789abcdefghijklmnopqrstuvwxyz") will be used.
DEFAULT:
|
use_space_char(bool) |
if True, add space char to the dict to recognize the space in between two words
|
blank_at_last(bool) |
padding with blank index (not the space index). If True, a blank/padding token will be appended to the end of the dictionary, so that blank_index = num_chars, where num_chars is the number of character in the dictionary including space char if used. If False, blank token will be inserted in the beginning of the dictionary, so blank_index=0.
|
lower |
if True, all upper-case chars in the label text will be converted to lower case. Set to be True if dictionary only contains lower-case chars. Set to be False if not and want to recognition both upper-case and lower-case.
TYPE:
|
| ATTRIBUTE | DESCRIPTION |
|---|---|
blank_idx |
the index of the blank token for padding
|
num_valid_chars |
the number of valid characters (including space char if used) in the dictionary
|
num_classes |
the number of classes (which valid characters char and the speical token for blank padding). so num_classes = num_valid_chars + 1
|
Source code in mindocr\postprocess\rec_postprocess.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | |
mindocr.postprocess.rec_postprocess.RecCTCLabelDecode.__call__(preds, labels=None, **kwargs)
¶| PARAMETER | DESCRIPTION |
|---|---|
preds |
network prediction, class probabilities in shape [BS, W, num_classes], where W is the sequence length.
TYPE:
|
labels |
optional
DEFAULT:
|
Return
texts (List[Tuple]): list of string
Source code in mindocr\postprocess\rec_postprocess.py
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | |
mindocr.postprocess.rec_postprocess.RecCTCLabelDecode.decode(char_indices, prob=None, remove_duplicate=False)
¶Convert to a squence of char indices to text string
| PARAMETER | DESCRIPTION |
|---|---|
char_indices |
in shape [BS, W]
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
text |
Source code in mindocr\postprocess\rec_postprocess.py
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | |
mindocr.scheduler
¶
Learning Rate Scheduler
mindocr.scheduler.dynamic_lr
¶
Meta learning rate scheduler.
This module implements exactly the same learning rate scheduler as native PyTorch,
see "torch.optim.lr_scheduler" <https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate>_.
At present, only constant_lr, linear_lr, polynomial_lr, exponential_lr, step_lr, multi_step_lr,
cosine_annealing_lr, cosine_annealing_warm_restarts_lr are implemented. The number, name and usage of
the Positional Arguments are exactly the same as those of native PyTorch.
However, due to the constraint of having to explicitly return the learning rate at each step, we have to
introduce additional Keyword Arguments. There are only three Keyword Arguments introduced,
namely lr, steps_per_epoch and epochs, explained as follows:
lr: the basic learning rate when creating optim in torch.
steps_per_epoch: the number of steps(iterations) of each epoch.
epochs: the number of epoch. It and steps_per_epoch determine the length of the returned lrs.
Since most scheduler in PyTorch are coarse-grained, that is the learning rate is constant within a single epoch.
For non-stepwise scheduler, we introduce several fine-grained variation, that is the learning rate
is also changed within a single epoch. The function name of these variants have the refined keyword.
The implemented fine-grained variation are list as follows: linear_refined_lr, polynomial_refined_lr, etc.
mindocr.scheduler.dynamic_lr.cosine_decay_lr(decay_epochs, eta_min, *, eta_max, steps_per_epoch, epochs, num_cycles=1, cycle_decay=1.0)
¶
update every epoch
Source code in mindocr\scheduler\dynamic_lr.py
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 | |
mindocr.scheduler.dynamic_lr.cosine_decay_refined_lr(decay_epochs, eta_min, *, eta_max, steps_per_epoch, epochs, num_cycles=1, cycle_decay=1.0)
¶
update every step
Source code in mindocr\scheduler\dynamic_lr.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 | |
mindocr.scheduler.multi_step_decay_lr
¶
MultiStep Decay Learning Rate Scheduler
mindocr.scheduler.multi_step_decay_lr.MultiStepDecayLR
¶
Bases: LearningRateSchedule
Multiple step learning rate The learning rate will decay once the number of step reaches one of the milestones.
Source code in mindocr\scheduler\multi_step_decay_lr.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | |
mindocr.scheduler.scheduler_factory
¶
Scheduler Factory
mindocr.scheduler.scheduler_factory.create_scheduler(steps_per_epoch, scheduler='constant', lr=0.01, min_lr=1e-06, warmup_epochs=3, warmup_factor=0.0, decay_epochs=10, decay_rate=0.9, milestones=None, num_epochs=200, num_cycles=1, cycle_decay=1.0, lr_epoch_stair=False)
¶
Creates learning rate scheduler by name.
| PARAMETER | DESCRIPTION |
|---|---|
steps_per_epoch |
number of steps per epoch.
TYPE:
|
scheduler |
scheduler name like 'constant', 'cosine_decay', 'step_decay', 'exponential_decay', 'polynomial_decay', 'multi_step_decay'. Default: 'constant'.
TYPE:
|
lr |
learning rate value. Default: 0.01.
TYPE:
|
min_lr |
lower lr bound for 'cosine_decay' schedulers. Default: 1e-6.
TYPE:
|
warmup_epochs |
epochs to warmup LR, if scheduler supports. Default: 3.
TYPE:
|
warmup_factor |
the warmup phase of scheduler is a linearly increasing lr,
the beginning factor is
TYPE:
|
decay_epochs |
for 'cosine_decay' schedulers, decay LR to min_lr in
TYPE:
|
decay_rate |
LR decay rate (default: 0.9)
TYPE:
|
milestones |
list of epoch milestones for 'multi_step_decay' scheduler. Must be increasing.
TYPE:
|
num_epochs |
number of total epochs.
TYPE:
|
lr_epoch_stair |
If True, LR will be updated in the beginning of each new epoch and the LR will be consistent for each batch in one epoch. Otherwise, learning rate will be updated dynamically in each step. (default=False)
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
Cell object for computing LR with input of current global steps |
Source code in mindocr\scheduler\scheduler_factory.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
mindocr.scheduler.warmup_cosine_decay_lr
¶
Cosine Decay with Warmup Learning Rate Scheduler
mindocr.scheduler.warmup_cosine_decay_lr.WarmupCosineDecayLR
¶
Bases: LearningRateSchedule
CosineDecayLR with warmup
| PARAMETER | DESCRIPTION |
|---|---|
min_lr |
(float) lower lr bound for 'WarmupCosineDecayLR' schedulers.
|
max_lr |
(float) upper lr bound for 'WarmupCosineDecayLR' schedulers.
|
warmup_epochs |
(int) the number of warm up epochs of learning rate.
|
decay_epochs |
(int) the number of decay epochs of learning rate.
|
steps_per_epoch |
(int) the number of steps per epoch.
|
step_mode |
(bool) determine decay along steps or epochs. True for steps, False for epochs.
DEFAULT:
|
The learning rate will increase from 0 to max_lr in warmup_epochs epochs,
then decay to min_lr in decay_epochs epochs
Source code in mindocr\scheduler\warmup_cosine_decay_lr.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | |
mindocr.utils
¶
mindocr.utils.callbacks
¶
mindocr.utils.callbacks.EvalSaveCallback
¶
Bases: Callback
Callbacks for evaluation while training
| PARAMETER | DESCRIPTION |
|---|---|
network |
network (without loss)
TYPE:
|
loader |
dataloader
TYPE:
|
ema |
if not None, the ema params will be loaded to the network for evaluation.
DEFAULT:
|
Source code in mindocr\utils\callbacks.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 | |
mindocr.utils.callbacks.EvalSaveCallback.on_train_epoch_begin(run_context)
¶Called before each epoch beginning.
| PARAMETER | DESCRIPTION |
|---|---|
run_context |
Include some information of the model.
TYPE:
|
Source code in mindocr\utils\callbacks.py
149 150 151 152 153 154 155 156 157 | |
mindocr.utils.callbacks.EvalSaveCallback.on_train_epoch_end(run_context)
¶Called after each training epoch end.
| PARAMETER | DESCRIPTION |
|---|---|
run_context |
Include some information of the model.
TYPE:
|
Source code in mindocr\utils\callbacks.py
159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 | |
mindocr.utils.callbacks.EvalSaveCallback.on_train_step_end(run_context)
¶Print training loss at the end of step.
| PARAMETER | DESCRIPTION |
|---|---|
run_context |
Context of the train running.
TYPE:
|
Source code in mindocr\utils\callbacks.py
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | |
mindocr.utils.checkpoint
¶
checkpoint manager
mindocr.utils.checkpoint.CheckpointManager
¶
Manage checkpoint files according to ckpt_save_policy of checkpoint.
| PARAMETER | DESCRIPTION |
|---|---|
ckpt_save_dir |
directory to save the checkpoints
TYPE:
|
ckpt_save_policy |
Checkpoint saving strategy. Option: None, "top_k", or "latest_k". None means to save each checkpoint, top_k means to save K checkpoints with the best performance, and latest_k means saving the latest K checkpoint. Default: top_k.
TYPE:
|
k |
top k value
TYPE:
|
prefer_low_perf |
standard for selecting the top k performance. If False, pick top k checkpoints with highest performance e.g. accuracy. If True, pick top k checkpoints with the lowest performance, e.g. loss.
TYPE:
|
Source code in mindocr\utils\checkpoint.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | |
mindocr.utils.checkpoint.CheckpointManager.ckpt_num
property
¶Get the number of the related checkpoint files managed here.
mindocr.utils.checkpoint.CheckpointManager.get_ckpt_queue()
¶Get all the related checkpoint files managed here.
Source code in mindocr\utils\checkpoint.py
33 34 35 | |
mindocr.utils.checkpoint.CheckpointManager.remove_ckpt_file(file_name)
¶Remove the specified checkpoint file from this checkpoint manager and also from the directory.
Source code in mindocr\utils\checkpoint.py
42 43 44 45 46 47 48 49 50 51 | |
mindocr.utils.checkpoint.CheckpointManager.save(network, perf=None, ckpt_name=None)
¶Save checkpoint according to different save strategy.
Source code in mindocr\utils\checkpoint.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | |
mindocr.utils.checkpoint.CheckpointManager.save_latest_k(network, ckpt_name)
¶Save latest K checkpoint.
Source code in mindocr\utils\checkpoint.py
69 70 71 72 73 74 75 76 | |
mindocr.utils.checkpoint.CheckpointManager.save_top_k(network, perf, ckpt_name, verbose=True)
¶Save and return Top K checkpoint address and accuracy.
Source code in mindocr\utils\checkpoint.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | |
mindocr.utils.ema
¶
mindocr.utils.ema.EMA
¶
Bases: nn.Cell
| PARAMETER | DESCRIPTION |
|---|---|
updates |
number of ema updates, which can be restored from resumed training.
DEFAULT:
|
Source code in mindocr\utils\ema.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | |
mindocr.utils.ema.EMA.ema_update()
¶Update EMA parameters.
Source code in mindocr\utils\ema.py
33 34 35 36 37 38 39 40 | |
mindocr.utils.evaluator
¶
mindocr.utils.evaluator.Evaluator
¶
| PARAMETER | DESCRIPTION |
|---|---|
network |
network
|
dataloader |
data loader to generate batch data, where the data columns in a batch are defined by the transform
pipeline and
|
loss_fn |
loss function
DEFAULT:
|
postprocessor |
post-processor
DEFAULT:
|
metrics |
metrics to evaluate network performance
DEFAULT:
|
pred_cast_fp32 |
whether to cast network prediction to float 32. Set True if AMP is used.
DEFAULT:
|
input_indices |
The indices of the data tuples which will be fed into the network. If it is None, then the first item will be fed only.
DEFAULT:
|
label_indices |
The indices of the data tuples which will be marked as label. If it is None, then the remaining items will be marked as label.
DEFAULT:
|
meta_data_indices |
The indices for the data tuples which will be marked as metadata. If it is None, then the item indices not in input or label indices are marked as meta data.
DEFAULT:
|
Source code in mindocr\utils\evaluator.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 | |
mindocr.utils.evaluator.Evaluator.eval()
¶Source code in mindocr\utils\evaluator.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 | |
mindocr.utils.logger
¶
Custom Logger.
mindocr.utils.logger.Logger
¶
Bases: logging.Logger
Logger.
| PARAMETER | DESCRIPTION |
|---|---|
logger_name |
String. Logger name.
|
rank |
Integer. Rank id.
DEFAULT:
|
Source code in mindocr\utils\logger.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | |
mindocr.utils.logger.Logger.setup_logging_file(log_dir)
¶Setup logging file.
Source code in mindocr\utils\logger.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 | |
mindocr.utils.logger.get_logger(log_dir, rank, log_fn=None)
¶
Get Logger.
Source code in mindocr\utils\logger.py
67 68 69 70 71 72 | |
mindocr.utils.loss_scaler
¶
mindocr.utils.loss_scaler.get_loss_scales(cfg)
¶
| PARAMETER | DESCRIPTION |
|---|---|
cfg |
configure dict of loss scaler
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
nn.Cell: scale_sens used to scale gradient |
|
float
|
loss_scale used in optimizer (only used when loss scaler type is static and drop_overflow update is False) |
Source code in mindocr\utils\loss_scaler.py
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | |
mindocr.utils.misc
¶
mindocr.utils.misc.AverageMeter
¶
Computes and stores the average and current value
Source code in mindocr\utils\misc.py
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | |
mindocr.utils.model_wrapper
¶
mindocr.utils.model_wrapper.NetWithEvalWrapper
¶
Bases: nn.Cell
A universal wrapper for any network with any loss for evaluation pipeline. Difference from NetWithLossWrapper: it returns loss_val, pred, and labels.
| PARAMETER | DESCRIPTION |
|---|---|
net |
network
TYPE:
|
loss_fn |
loss function, if None, will not compute loss for evaluation dataset
DEFAULT:
|
input_indices |
The indices of the data tuples which will be fed into the network. If it is None, then the first item will be fed only.
DEFAULT:
|
label_indices |
The indices of the data tuples which will be fed into the loss function. If it is None, then the remaining items will be fed.
DEFAULT:
|
Source code in mindocr\utils\model_wrapper.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | |
mindocr.utils.model_wrapper.NetWithEvalWrapper.construct(*args)
¶| PARAMETER | DESCRIPTION |
|---|---|
args |
contains network inputs, labels (given by data loader)
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tuple
|
loss value (Tensor), pred (Union[Tensor, Tuple[Tensor]]), labels (Tuple) |
Source code in mindocr\utils\model_wrapper.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | |
mindocr.utils.model_wrapper.NetWithLossWrapper
¶
Bases: nn.Cell
A universal wrapper for any network with any loss.
| PARAMETER | DESCRIPTION |
|---|---|
net |
network
TYPE:
|
loss_fn |
loss function
|
input_indices |
The indices of the data tuples which will be fed into the network. If it is None, then the first item will be fed only.
DEFAULT:
|
label_indices |
The indices of the data tuples which will be fed into the loss function. If it is None, then the remaining items will be fed.
DEFAULT:
|
Source code in mindocr\utils\model_wrapper.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | |
mindocr.utils.model_wrapper.NetWithLossWrapper.construct(*args)
¶| PARAMETER | DESCRIPTION |
|---|---|
args |
contains network inputs, labels (given by data loader)
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
loss_val
|
loss value
TYPE:
|
Source code in mindocr\utils\model_wrapper.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | |
mindocr.utils.recorder
¶
mindocr.utils.recorder.PerfRecorder
¶
Bases: object
Source code in mindocr\utils\recorder.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | |
mindocr.utils.recorder.PerfRecorder.add(epoch, *measures)
¶measures (Tuple): measurement values corresponding to the metric names
Source code in mindocr\utils\recorder.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 | |
mindocr.utils.seed
¶
random seed
mindocr.utils.seed.set_seed(seed=42)
¶
Note: to ensure model init stability, rank_id is removed from seed.
Source code in mindocr\utils\seed.py
9 10 11 12 13 14 15 16 17 18 19 | |
mindocr.utils.train_step_wrapper
¶
Train step wrapper supporting setting drop overflow update, ema etc
mindocr.utils.train_step_wrapper.TrainOneStepWrapper
¶
Bases: nn.TrainOneStepWithLossScaleCell
TrainStep with ema and clip grad.
| PARAMETER | DESCRIPTION |
|---|---|
drop_overflow_update |
if True, network will not be updated when gradient is overflow.
DEFAULT:
|
scale_sense |
If this value is a Cell, it will be called
to update loss scale. If this value is a Tensor, the loss scale can be modified by
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
Tuple of 3 Tensor, the loss, overflow flag and current loss scale value. |
|
|
loss (Tensor) - A scalar, the loss value. |
|
|
overflow (Tensor) - A scalar, whether overflow occur or not, the type is bool. |
|
|
loss scale (Tensor) - The loss scale value, the shape is :math: |
Source code in mindocr\utils\train_step_wrapper.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 | |
mindocr.version
¶
version init